Quarterly AI program review template

Most AI programs lose legibility somewhere between the third and the fifth workflow. The first two had champions who gave board updates. By the fourth, nobody can answer "what's our AI program doing this quarter?" without a meeting and a deck.

The fix is a quarterly written review — one page per workflow, same format every time. Below is the template we use. It's deliberately boring. Boring is what survives leadership changes, compliance reviews, and CFO scrutiny.

The template

One page per production workflow. Eight sections. Same headings every quarter. The discipline of having to fill it in every ninety days is what keeps the program honest.

Section 1 — Workflow name and one-sentence description

"Contract triage for the legal department: classify and route inbound contracts to the partner who owns each one." Plain language. No jargon. If a board member can't understand the sentence, rewrite it.

Section 2 — What it produced this quarter

The unit of work and the count. "423 contracts triaged." "1,217 credit memos drafted." "8,940 documents extracted." Volume is the first thing every reader wants to know; lead with it.

Section 3 — Accuracy on the live evaluation set

Current quarter score, prior quarter score, change. "94.2% (Q1: 93.7%, +0.5)." If accuracy dropped, name the cause in one sentence. If it rose, name what changed. If it didn't move, say so explicitly — silent metrics invite suspicion.

Section 4 — Cost

Total spend this quarter, broken into model API cost, infrastructure, and licensed data. Compare to the budget. "Total $4,820 against $6,000 budget; $1,180 favorable. Driver: switched from Claude Opus to Sonnet for the routing step in Q2." Cost is where the CFO will read carefully.

Section 5 — Human-in-the-loop activity

How many outputs went through the review gate, how many were corrected, what categories the corrections fell into. "Of 423 outputs, 89 (21%) routed to human review. Of those, 14 were overridden — all in the 'multi-party contract' category, which is now an open improvement item." This is the section that proves the system is operated, not just installed.

Section 6 — Changes shipped this quarter

What changed in the system, and why. Bullet list. "Updated routing prompt to handle MSAs separately (closed two recurring failure modes). Added 12 examples to eval set covering multi-party contracts. Migrated logging to centralized warehouse for compliance." If nothing changed, write that — a workflow that didn't change for a quarter is either stable or stale, and the reader needs to know which.

Section 7 — Open issues and next quarter's plan

The known problems and what's planned to address them. "Multi-party contract routing accuracy lags by 8 points. Next quarter: collect 30 more labeled examples, evaluate prompt restructure with cited examples." This is the section that converts the review from a backward-looking report into a forward-looking commitment.

Section 8 — Risks and policy notes

Anything compliance, legal, or risk should know about. "Vendor announced model deprecation in Q3 — migration plan in progress." "New SOC 2 control added covering eval set retention." If there's nothing to flag, write "None this quarter" — the absence is data.

How to run the review meeting

The page goes out 48 hours before the meeting. Reading time is 5-7 minutes per workflow. The meeting itself is for the questions the page surfaces, not for the operator to read aloud.

Three roles in the room: the operator (who wrote the page), the sponsor (who owns the program), and a reviewer (compliance, finance, or legal — rotating, but always represented). Twenty minutes per workflow. If a workflow takes more than twenty minutes, that's a flag — either it's in trouble, or the page wasn't honest about the issues.

The output of the meeting is two things: any decisions captured in the next-quarter plan section, and a list of escalations for the program sponsor to take to the executive team. Both get written into the page within 24 hours. The page becomes the durable artifact; the meeting is just the forcing function.

What to do with the pages

Store all of them in the same place — a shared folder, a Notion database, a Confluence space. Anyone with access should be able to find any quarter's review for any workflow within thirty seconds. The retention runway should match your other regulated records.

The board version is the cover page: a portfolio table with one row per workflow showing volume, accuracy trend, cost trend, and status (green / yellow / red). The detail pages live behind it for anyone who wants to read deeper. The board only ever asks to read the detail when the cover-page status is yellow or red — which is exactly when you want them reading.

Why the discipline matters

Every AI program we've seen that survived a leadership change or a compliance audit had this artifact. Every one that didn't survive, didn't have it. The pages don't make the program good — they make the program legible, and legible programs survive long enough to get good.

If your AI program doesn't have a quarterly written review yet, the cost of starting is one operator-day per workflow per quarter. The cost of not starting is roughly one consulting engagement per workflow per leadership change. The math is not subtle.