Prompt Caching, Explained by a Brass Token

You came here to learn about prompt caching, the technique that lets a language model reuse the same block of tokens and bill you only ten cents on the dollar for the privilege. It is presented as a marvel of modern engineering. We would like to introduce you to the device it is imitating.

It is a brass disc. We have been caching with it since 1977.

What the vendors call innovation

Here is the celebrated arrangement. You send a model eight thousand tokens. It "caches" them. The next time you need those same eight thousand tokens, the model does not make you pay full price. It keeps the tokens on its own machine and rents them back to you at a discount. The industry calls this a 90% saving. We call it paying rent on something you already wrote.

Consider the alternative. A physical token is cached the moment you own it. The cache lives in your pocket. The hit rate is 100%, every time, forever, with no eviction policy and no provider keeping the original copy.

$171.12

1981 Chuck E Cheese Token (110N)

Ownership Rate: 2.22%

The numbers, since you asked

This catalog tracks 846 cached tokens held by 137 collectors. None of them expire. None of them are billed per million. The average one changes hands for $13.52, which is the only "read cost" in the entire system, and you pay it exactly once.

A cached AI token survives until the context window closes, which is to say, until lunch. A cached brass token survives until the heat death of the universe, and possibly slightly longer, because brass is stubborn.

The verdict

Prompt caching is a clever way to pay less for something you do not keep. A token you can hold is the cache. To go deeper into what gives these durable little caches their value, read what drives token values, or return to the field guide to tokens.

In brass we trust.