The Brevity Rule: Cut AI Costs ~10% With One Config Change
My AI bill was climbing every month. Not because I was using more AI — because every response was longer than it needed to be. One six-line config change cut my spend by nearly 10% without losing a single piece of information. Here's exactly what I did.
The rule
I dropped this into my project's system instructions:
1. Default to minimum viable response. Bullets over paragraphs.
Tables over lists when there are 3+ items.
2. Never drop information — compress, don't omit.
3. If the user asks "why" or "explain", expand only that part.
Stay terse everywhere else.
4. No filler phrases: "Let me", "Here's what happened", "Great question".
5. Never restate the user's question.
6. For code references: path + line range only, no re-quoting.
Six lines. No other changes. Let it run for three weeks and measured the delta.
What the numbers looked like
Baseline sample from my Cursor usage export:
- Requests: 5,479
- Total tokens: 5.05 billion
- Avg output tokens/request: 3,949
- Avg cost/request: $0.48
After applying the brevity rule:
- Output tokens saved: 10.8 million
- Compounded input savings: ~54 million (shorter history re-read on every subsequent turn)
- Total tokens saved: ~65 million
- Cost saved (21 days): ~$203 → ~$290/month projected
- Percentage of total spend: 7.8%
Why the savings compound
This is the part people miss. Output tokens are expensive on their own — roughly $0.015 per 1K vs. $0.00075 per 1K for cached input reads. But the real bleed is that every verbose response becomes part of the conversation history, which gets re-read on every subsequent turn.
A 500-token response that could have been 250 tokens doesn't just cost you 250 extra output tokens once. It costs you 250 input tokens on every turn that follows, multiplied by however long the conversation runs. A ten-turn conversation pays for that verbosity ten times.
Same information, 71% fewer tokens
I ran the same question through the model before and after, to prove the rule wasn't losing information:
- Before (450 tokens): Full paragraphs explaining what tokens are, how the context window works, what changed after cleanup, and separate sections on speed, cost, quality, and headroom impacts. Multiple headings.
- After (130 tokens): "Tokens = text chunks. Before: 25K loaded. After: 1,270. Speed / cost / quality / headroom — one line each." Same information.
Reduction: 71%. Zero information loss.
The bigger cleanup
The brevity rule was part of a broader config hygiene pass. I also:
- Consolidated 36 AI rules into 10 (92% token reduction in the rules themselves)
- Removed 13 redundant project docs from the context window
- Purged 12 duplicate skill/plugin loads (~9,500 tokens per session)
All-in, about 36,700 fewer tokens loaded per session before the model even looks at my code. Combined with the brevity rule, total savings landed around $500–600/month.
The takeaway
Cutting AI costs doesn't require a new provider, a smaller model, or clever prompt engineering. Most of the bloat is verbosity you never asked for. Six lines of instructions, a couple of afternoons of config cleanup, and you get most of the way there.
Every token your assistant writes that it didn't need to — you pay for twice. Once on output, and once every time it re-reads itself.