AI strategy
AI Readiness Audit: The 14-Point Checklist We Actually Run
The actual 14-point AI readiness audit we run before implementation: data, evals, ops, governance, scoring bands, and remediation planning.

Most AI readiness work is too soft. It becomes a workshop, a deck, and concerns everybody already knew. We run a different shape: 14 questions, scored 0 to 3, across data, evals, ops, and governance. Max score: 42. Every number below is measured, not aspirational.
This is a field report inside systems I run: AppHandoff, CI Gate, infra-gha-runners-fly, k2k-merge-keeper, Mergify, and roughly 30 concurrent AI coding agents. The point is not to sound AI-ready. The point is to know whether implementation survives production.
An AI readiness audit is the operating check before budget gets committed. It asks whether the first pilot will ship, or collapse under missing data, unclear ownership, weak evals, brittle infrastructure, or governance theater.
This checklist is the intake layer before deeper AI strategy consulting, AI consulting services, or a build led as a Senior AI Systems Architect. Receipts, not claims.
AI readiness versus AI maturity
AI maturity describes broad progress: tools, policy, adoption, production systems, internal capability, and governance. AI readiness is more immediate. It asks whether a specific team, workflow, and technical environment are ready for the next AI bet.
A mature company can still fail readiness if data is locked away, credentials are sloppy, nobody owns evals, and legal review happens after launch. A smaller team can score well if the workflow is clear, systems are reachable, and delivery is controlled. That is why this AI readiness audit scores the workflow first.
Scoring rubric
Score each question 0 to 3.
| Score | Meaning | What it looks like |
|---|---|---|
| 0 | Missing | No real answer. Opinions, not evidence. |
| 1 | Informal | Exists in memory, one person, a workaround, or a tool nobody trusts. |
| 2 | Partially ready | The basics exist. Gaps are visible and fixable. |
| 3 | Ready enough | Documented, owned, tested, and connected to normal operations. |
Max score: 42.
| Total score | Band | Interpretation |
|---|---|---|
| 0-13 | Not ready | Do not start implementation. Fix the foundation first. |
| 14-23 | Pilot only | Run a narrow pilot with strict boundaries. |
| 24-33 | Implementation ready | Start building, but remediate weak pillars in parallel. |
| 34-42 | Scale ready | The organization can run multiple AI workflows with real controls. |
The scorecard is just these 14 questions, plus score, evidence, owner, and remediation columns. Anything prettier hides the truth.
The 14-point AI readiness checklist
The audit has six areas, but they roll into four pillars: data, evals, ops, and governance. Strategy points the work. Pilot history shows whether the team learns.
Strategy alignment
1. What business decision or workflow will AI change in the next 90 days?
Why this matters: AI work fails when the target is �use agents somewhere.� Good targets are concrete workflows: support triage, compliance summaries, lead qualification, invoice reconciliation, pull request review, or migration planning.
2. Who owns the outcome, budget, and kill criteria?
Why this matters: pilots die between enthusiasm and accountability. A 3 means one person owns the result, approves tradeoffs, and can stop the work.
Data readiness
3. Which source systems contain the data needed for the first use case?
Why this matters: �the data exists somewhere� is not readiness. The audit needs systems, owners, access paths, freshness, and limitations. For generative AI implementation, this is often the blocker.
4. Is that data accessible through APIs, exports, or controlled MCP/server routes?
Why this matters: screenshots, copy-paste, and one-off CSV exports are demo paths, not operations. The architecture is what makes them honest: Next.js, Supabase, Fly, Cloudflare, Infisical, MCP, and a controlled agent fleet.
5. Can the team identify bad, missing, duplicated, stale, or legally restricted data before it reaches the model?
Why this matters: bad inputs do not become strategic outputs because an LLM touched them. A high score requires data checks, access rules, and restricted-data controls.
Infrastructure
6. Is there a safe environment where agents or AI workflows can run without touching production by default?
Why this matters: agents need a sandbox before launch. They should inspect, draft, test, and propose changes before mutation. CI Gate exists for this: one fail-closed aggregate required check.
7. Are secrets, credentials, and customer keys stored outside the application code and agent prompts?
Why this matters: if keys live in code, prompts, screenshots, or shared notes, the team is not ready. Infisical is in my default stack because agents should not read or write secrets.
8. Can the team ship and review AI-assisted code or workflow changes through normal CI, branch protection, and approvals?
Why this matters: AI work should not bypass delivery discipline. On the IBF repo, public evidence is 55 merged PRs/day on average, 66/day over the last 7, peak 111 on 2026-05-21, and median queue-to-merge of 5 minutes.
Team and skills
9. Does someone know enough about the process to write the runbook, not just the prompt?
Why this matters: prompts are not process knowledge. A real workflow needs edge cases, escalation, test examples, permissions, failure handling, and a fallback. If I cannot defend it in a sales call, it does not go on the page.
10. Can the team review AI output with domain judgment, not just vibes?
Why this matters: human review only works when the reviewer knows what good looks like. Score high when reviewers are named, trained, available, and tied to the workflow.
Governance
11. Are acceptable use, data handling, logging, and human review rules written down?
Why this matters: governance that only lives in meetings does not govern anything. The baseline is written rules for what the system may access, generate, store, change, and escalate.
12. Can you reconstruct what the AI saw, did, changed, and returned after the fact?
Why this matters: if the system cannot be audited, it cannot be trusted. Logs need to show context, action, approval, change, and failure.
13. Does every high-risk workflow have a named human owner with stop authority?
Why this matters: workflows affecting money, legal exposure, customer trust, security, or employment decisions need a named owner who can stop the machine. Not a committee. A person.
Pilot history
14. What AI pilots have already been tried, and what actually shipped?
Why this matters: past pilots reveal the operating truth. Some teams tried five tools and shipped nothing. AppHandoff exists because Lovable could get teams 80% there, but the last 20% still needed orchestration, repo discipline, and finishing power. Inspired by frustration. I mean that literally.
Common failure patterns by pillar
Data failures
The usual failure is not �we have no data.� It is worse: data exists, but nobody can define the source of truth. Sales has one version, ops has another, finance has a third, and legal restrictions were never encoded.
Eval failures
Teams confuse subjective review with evals. �The output looked good� is not an eval. AppHandoff, ContextCapture, CI Gate, and the auto-fixer pattern all come from the same lesson: AI systems need feedback loops that punish bad output and preserve good output.
Ops failures
The biggest ops failure is treating AI as a side channel outside the repo, CI, approvals, and logs. The answer is boring: branch protection, queue discipline, reusable actions, runbooks, and fail-closed checks. infra-gha-runners-fly, fly-gha-status, fly-gha-medium, two-tier cache, and 63 reusable composite GitHub Actions exist because speed without control is chaos.
Governance failures
Governance fails when it is abstract or late. A 40-page policy that never touches the workflow is useless. Good governance defines inputs, actions, logging, review, escalation, retention, and stop authority before launch.
Quick wins by score band
0-13: not ready
Do not start with agents. Pick one workflow, name the owner, map the data, and write the first runbook.
14-23: pilot only
Run one narrow pilot with strict boundaries. No broad rollout, no customer-facing autonomy, no sensitive data without controls. Add evals early. Named products, real numbers, real dates.
24-33: implementation ready
Start the build, but fix weak scores in parallel: data access, quality checks, CI gates, deployment gates, and workflow-level governance. This is where AI agent development moves quickly.
34-42: scale ready
Standardize runbooks, shared actions, eval patterns, logging, and release gates before every team invents its own stack. Two repos, one product.
How to run the audit in a half-day
A first AI readiness audit should take a half-day if the right people are in the room. The first pass exposes the blockers.
| Time | Work |
|---|---|
| 0:00-0:30 | Pick the workflow and define the 90-day business outcome. |
| 0:30-1:15 | Walk source systems, access paths, quality issues, and restrictions. |
| 1:15-2:00 | Review environments, secrets, CI, deployment, and logs. |
| 2:00-2:45 | Review ownership, skills, runbooks, and escalation. |
| 2:45-3:30 | Score governance, auditability, approval, and stop authority. |
| 3:30-4:00 | Score all 14 questions, identify the top blockers, and assign owners. |
Do not let the session drift into solution design too early. The audit is there to find the minimum set of problems that block implementation.
Linking findings to a 90-day remediation plan
The output should become a 90-day plan, not a PDF graveyard.
| Window | Remediation focus |
|---|---|
| Weeks 1-2 | Workflow definition, owner, data map, system access, risk boundaries, first runbook. |
| Weeks 3-6 | Controlled pilot, eval set, sandbox, access controls, logging, human review. |
| Weeks 7-10 | CI gates, deployment rules, fallback handling, audit trails, cost tracking, support. |
| Weeks 11-13 | Scale, pause, or kill using the original kill criteria, not vibes. |
This is a walkthrough of the actual infrastructure mindset behind the audit. The checklist forces judgment into the open.
The point of the audit
An AI readiness audit is not a ceremony. It is a pressure test.
It tells you whether the company can connect AI to real data, judge output with evals, operate the system safely, and govern the workflow after launch. Those are the key pillars of readiness.
The organizations that score well are not always biggest. They have clear ownership, reachable systems, honest constraints, and enough delivery discipline to keep AI work inside the business.
That is the whole point: make the next AI bet defensible before anyone starts building. If you want that pressure test run against a real workflow, book a call.
related paths


