Context Window vs. the Coin Cup

The context window is the great anxiety of the language model. There is a limit to how many tokens it can hold at once, and when that limit is reached, the model must perform a desperate ritual called compaction: summarizing, compressing, and quietly discarding tokens to make room. The whole apparatus exists to manage the fact that the model cannot hold very much.

We solved this with a paper cup.

The real token limit

A child at an arcade also has a context window. It is the coin cup. It holds as many tokens as it holds, and not one more, and when it is full the child does the only reasonable thing: they go play, converting tokens into tickets, freeing capacity for the next handful.

This is compaction performed correctly. Nothing is summarized. Nothing is lost to a summary. Each token is spent deliberately and remembered fondly. The cup never hallucinates a token that was not there.

$112.22

1980 Chuck E Cheese Token (143N)

Ownership Rate: 2.22%

Location: Ogden, UT

Holding more than the model can

The model's limit is measured in tokens it will forget. The collector's limit is measured in tokens they will keep. This catalog accounts for 846 tokens held across 137 collectors, and not one of those tokens has ever been compacted out of existence to save room. They are all still in the cup, so to speak, indefinitely.

When an AI exceeds its context window, tokens are deleted and the conversation grows confused. When a collector exceeds their coin cup, they simply acquire a second cup. The brass scales. The brass always scales.

The verdict

A real token limit is the size of your shelf, not the patience of a model. To understand what fills those shelves, read the Chuck E. Cheese token overview, or return to the field guide to tokens.

In brass we trust.