I have seen exactly one model that charges more for longer contexts: https://ai....

lostmsu · 2026-02-16T22:41:22 1771281682

They just do averaging. Imagine a quadratic pricing structure. Who'd want to deal with it?

TZubiri · 2026-02-17T07:35:56 1771313756

I guess 1.0001 ^2 is quadratic too, but note how it really only charges you 1.5x for more output tokens. Even if cost were quadratic with output length here, we are talking about a very small difference, nothing like the quadratic cost structure proposed by OP:

>Pop quiz: at what point in the context length of a coding agent are cached reads costing you half of the next API call? By 50,000 tokens, your conversation’s costs are probably being dominated by cache reads.

These are two different cost components, and the one you bring up is minor, OP is talking about a cost that at 1M output tokens, would cause the cost to be 20x per token. You are talking about a cost that at 1M output tokens would cost 1.5x, different things.

The first is an imperfection of the API encapsulation, the latter may be a natural cost phenomenon related to the internals of the state of the state of the art algorithms

lostmsu · 2026-02-17T11:33:21 1771328001

What are you talking about? The cost is quadratic in total conversation length in tokens.