AI costs

The Brevity Rule: Cut AI Costs ~10% With One Config Change

Six lines of instructions, zero information loss, and nearly 10% off my AI bill. The exact rule, the numbers behind it, and why verbose responses bleed budget twice.

Ralph DuinMarch 20, 2026 · 3 min read

shareX LI

The Brevity Rule: Cut AI Costs ~10% With One Config Change

My AI bill was climbing every month. Not because I was using more AI — because every response was longer than it needed to be. One six-line config change cut my spend by nearly 10% without losing a single piece of information. Here's exactly what I did.

The rule

I dropped this into my project's system instructions:

1. Default to minimum viable response. Bullets over paragraphs.
   Tables over lists when there are 3+ items.
2. Never drop information — compress, don't omit.
3. If the user asks "why" or "explain", expand only that part.
   Stay terse everywhere else.
4. No filler phrases: "Let me", "Here's what happened", "Great question".
5. Never restate the user's question.
6. For code references: path + line range only, no re-quoting.

Six lines. No other changes. Let it run for three weeks and measured the delta.

What the numbers looked like

Baseline sample from my Cursor usage export:

Requests: 5,479
Total tokens: 5.05 billion
Avg output tokens/request: 3,949
Avg cost/request: $0.48

After applying the brevity rule:

Output tokens saved: 10.8 million
Compounded input savings: ~54 million (shorter history re-read on every subsequent turn)
Total tokens saved: ~65 million
Cost saved (21 days): ~$203 → ~$290/month projected
Percentage of total spend: 7.8%

Why the savings compound

This is the part people miss. Output tokens are expensive on their own — roughly $0.015 per 1K vs. $0.00075 per 1K for cached input reads. But the real bleed is that every verbose response becomes part of the conversation history, which gets re-read on every subsequent turn.

A 500-token response that could have been 250 tokens doesn't just cost you 250 extra output tokens once. It costs you 250 input tokens on every turn that follows, multiplied by however long the conversation runs. A ten-turn conversation pays for that verbosity ten times.

Same information, 71% fewer tokens

I ran the same question through the model before and after, to prove the rule wasn't losing information:

Before (450 tokens): Full paragraphs explaining what tokens are, how the context window works, what changed after cleanup, and separate sections on speed, cost, quality, and headroom impacts. Multiple headings.
After (130 tokens): "Tokens = text chunks. Before: 25K loaded. After: 1,270. Speed / cost / quality / headroom — one line each." Same information.

Reduction: 71%. Zero information loss.

The bigger cleanup

The brevity rule was part of a broader config hygiene pass. I also:

Consolidated 36 AI rules into 10 (92% token reduction in the rules themselves)
Removed 13 redundant project docs from the context window
Purged 12 duplicate skill/plugin loads (~9,500 tokens per session)

All-in, about 36,700 fewer tokens loaded per session before the model even looks at my code. Combined with the brevity rule, total savings landed around $500–600/month.

The takeaway

Cutting AI costs doesn't require a new provider, a smaller model, or clever prompt engineering. Most of the bloat is verbosity you never asked for. Six lines of instructions, a couple of afternoons of config cleanup, and you get most of the way there.

Every token your assistant writes that it didn't need to — you pay for twice. Once on output, and once every time it re-reads itself.

The Brevity Rule: Cut AI Costs ~10% With One Config Change

The rule

What the numbers looked like

Why the savings compound

Same information, 71% fewer tokens

The bigger cleanup

The takeaway

More from the journal

Fractional CTO vs Fractional AI CTO: What Actually Changes

When To Hire a Fractional AI CTO (and When Not To)

Fractional CTO Rates in 2026: What the Bands Actually Mean