Audit Retainer methodology · v1 · 2026

How architecture-under-watch works.

A working description of the Audit Retainer pipeline — what we audit, when, and how the same engineer ships the fix instead of handing you a slide deck.

← Back to Audit Retainer

Why retainers, not one-shots

The standard architecture-audit engagement is a two-week, slide-deck deliverable. A senior engineer parachutes in, reads code, interviews the team, and ships a 30-page report. The report identifies twenty findings; the customer addresses three; the rest become technical debt that's "known but unowned." Six months later the architecture has drifted, the report is stale, and the cycle repeats with a new vendor.

The Audit Retainer collapses that cycle. Same engineer every month, indefinitely. The audit isn't a deliverable — it's a continuous diff. The findings aren't slides — they're merged pull requests in your repos. The cycle isn't 6 months between checkpoints — it's weekly between weekly, with daily passive instrumentation underneath.

The economic argument is straightforward. A typical Garnet customer running Audit Scale ($9,999/mo) lands 2–8 merged engineering tickets/month, recovers ~28% on infra-spend line items within the first six months, and progresses from ~60% to ~95% on a SOC 2 readiness gap-list over the same window. The retainer is cheaper than the one-shots it replaces, and the work is owned end-to-end instead of handed off.

The four-axis audit

Most "architecture audit" engagements stop at security and call it done. We track four axes continuously, because the trade-offs between them are where production debt actually lives:

A single number on each axis is the leading indicator. The diff between months is the lagging indicator and the work item.

Daily — passive snapshots

A snapshot writer Worker (deployed in your Cloudflare account, talking to your cloud APIs with read-only credentials you provision) runs nightly and lands a structured JSON snapshot in your R2. Each snapshot covers:

Schema fingerprint

DB schemas (Postgres, MySQL, BigQuery, Snowflake — whatever you're on), IaC repos (Terraform, Pulumi, CloudFormation, CDK), API contract files (OpenAPI, GraphQL SDL, Protobuf). Hash-tracked. Drift between two consecutive snapshots trips an alert in #audit-drift the next morning. The schema-diff is structural — we ignore migrations that only add columns or rename without breaking change, surface anything that's a contract break.

IAM + secrets posture

IAM-graph snapshot (AWS IAM / GCP IAM / Azure RBAC + OIDC trust policies), last-rotation timestamps on tracked secrets (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, 1Password Connect, Doppler — pick your stack), public-bucket scan, exposed-key scan via gitleaks-style heuristics across the repos we have access to.

Cost + latency

Cloud-bill API pull (AWS CUR, GCP Billing, Azure Cost Management — full daily granularity, tagged-by-service), observability metric pull from your existing stack (Datadog API, CloudWatch Metrics, Grafana Mimir, Prometheus federation, Sentry — we adapt to what you already pay for). Per-service unit costs, request volume, p50/p95/p99. We do not require you to install a new agent — read-only API access is enough.

Compliance posture

For customers on a SOC 2 / ISO 27001 / HIPAA / GDPR path: a daily diff against the control-list. Each control row is mapped to a passive evidence query (e.g., SOC 2 CC6.1 "logical access controls" maps to an IAM-graph query that confirms MFA-required on console access). When evidence shifts, the row's posture updates.

Raw snapshots stay in your R2; only the structured diffs replicate to our control plane (encrypted at rest, your tenant only). We do not store production data verbatim. Snapshots are application/json, typically 50–500 KB/day, designed to be auditable in isolation. If you offboard, the snapshots remain in your R2 — they are your audit trail, not ours.

Weekly — active diff + drill

Once a week, the engineer (the same one all month) sits with the diff. Three artifacts ship:

Monthly — executive PDF

On the 1st of each month, a Cloudflare Workflow renders an executive PDF covering the same shape every month so you can trend across quarters without re-orienting. Designed to be forward-able to your CFO without translation:

The PDF is signed (DocuSign optional at Enterprise tier) and delivered to your inbox via Postal — sovereign mail, our infra, DKIM-verified. A summary embed also lands in your Discord #audit-monthly channel for the broader team.

Day 1, Day 30, Day 90

Day 1 — onboarding kickoff

Day 30 — first executive PDF

Day 90 — full retainer maturity

What success looks like

Across the first 90 days, Audit Pro typically opens 3–6 findings per cycle and closes 4–8 (the surplus from the first month's backlog). Audit Scale closes 2–8 engineering tickets/month as merged PRs in customer repos. Compliance posture progress varies by stack — a typical SOC 2-aspiring customer moves from ~60% to ~95% within 6 months. Cost reduction averages ~28% on infra-spend line items in the first 6 months (anonymized aggregate; your mileage will depend on starting hygiene).

What it isn't

FAQ

How is this priced relative to a Big-4 audit?

A Big-4 architecture audit typically runs $120K–$400K for a 6–10 week engagement. Audit Scale ($9,999/mo) is $120K/year — same headline number, but for ongoing work with merged tickets, not a single-shot deck. Audit Pro ($4,999/mo) is $60K/year and covers the diagnostic + 4–8 audit hours of remediation work. The economics flip the moment you treat audits as a recurring discipline rather than a snapshot.

What if our team disagrees with a finding?

Findings are proposals. Each red/amber finding lands in your tracker as a ticket your team can close, decline, or push back on. A declined finding is recorded with the rationale and doesn't re-open next cycle. The engineer's job is to surface and frame — not to override your team's judgment.

Do we need to be on Cloudflare to use this?

The snapshot writer Worker runs on Cloudflare (which is why this is the lane's host requirement), but your primary infra can live anywhere — AWS, GCP, Azure, on-prem, multi-cloud. The Worker pulls read-only from your existing cloud APIs. If you don't have a Cloudflare account, we'll provision one for you (free tier covers most Pro/Scale customers; Enterprise customers typically already have one).

What happens to our snapshots if we cancel?

They stay in your R2. Cancellation removes Garnet's read access to the diff control plane, but your historical snapshots (the daily JSON, the monthly PDFs) are entirely yours. Many customers retain the snapshot pipeline post-cancellation as a passive audit trail; you can also point a different operator at the same R2 bucket if you change vendors.

What about software vulnerabilities and CVEs?

Dependency CVE exposure is one of the security-axis snapshot inputs. We integrate with whichever scanner you already run (Snyk, Dependabot, Trivy, Grype) — we don't add a new scanner unless you don't have one. CVE counts and trends land in the monthly PDF security section; high-severity CVEs (CVSS >= 8.0) trigger same-day alerts at Scale and Enterprise.

Can the retainer cover only one repo or one cloud?

Yes. Pro tier is typically scoped to 1–2 repos + 1 cloud. Scale and Enterprise expand the scope. If you have a "primary" service plus a "long tail" of internal tooling, we usually recommend the audit scope = primary + most-shared internal libs, leaving the long tail as next-cycle expansion.

Adjacent lanes

Audit Retainer is one of three production lanes. Customers running an architecture-under- watch program often pair it with:

See Audit Retainer pricing →   See the 30-day onboarding walkthrough →   or talk to engineering