The Brevity Rule: Cut AI Costs ~10% With One Config Change

My AI bill was climbing every month. Not because I was using more AI — because every response was longer than it needed to be. One six-line config change cut my spend by nearly 10% without losing a single piece of information. Here's exactly what I did.

The rule

I dropped this into my project's system instructions:

1. Default to minimum viable response. Bullets over paragraphs.
   Tables over lists when there are 3+ items.
2. Never drop information — compress, don't omit.
3. If the user asks "why" or "explain", expand only that part.
   Stay terse everywhere else.
4. No filler phrases: "Let me", "Here's what happened", "Great question".
5. Never restate the user's question.
6. For code references: path + line range only, no re-quoting.

Six lines. No other changes. Let it run for three weeks and measured the delta.

What the numbers looked like

Baseline sample from my Cursor usage export:

After applying the brevity rule:

Why the savings compound

This is the part people miss. Output tokens are expensive on their own — roughly $0.015 per 1K vs. $0.00075 per 1K for cached input reads. But the real bleed is that every verbose response becomes part of the conversation history, which gets re-read on every subsequent turn.

A 500-token response that could have been 250 tokens doesn't just cost you 250 extra output tokens once. It costs you 250 input tokens on every turn that follows, multiplied by however long the conversation runs. A ten-turn conversation pays for that verbosity ten times.

Same information, 71% fewer tokens

I ran the same question through the model before and after, to prove the rule wasn't losing information:

Reduction: 71%. Zero information loss.

The bigger cleanup

The brevity rule was part of a broader config hygiene pass. I also:

All-in, about 36,700 fewer tokens loaded per session before the model even looks at my code. Combined with the brevity rule, total savings landed around $500–600/month.

The takeaway

Cutting AI costs doesn't require a new provider, a smaller model, or clever prompt engineering. Most of the bloat is verbosity you never asked for. Six lines of instructions, a couple of afternoons of config cleanup, and you get most of the way there.

Every token your assistant writes that it didn't need to — you pay for twice. Once on output, and once every time it re-reads itself.

Related reading