Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have seen exactly one model that charges more for longer contexts:

https://ai.google.dev/gemini-api/docs/pricing

Gemini 1M context window

That said the cost increase isn't very significant, approximately 2x at the longer end of the context window.

This is in stark contrast with the quadratic phenomenon claimed by the article.

 help



They just do averaging. Imagine a quadratic pricing structure. Who'd want to deal with it?

I guess 1.0001 ^2 is quadratic too, but note how it really only charges you 1.5x for more output tokens. Even if cost were quadratic with output length here, we are talking about a very small difference, nothing like the quadratic cost structure proposed by OP:

>Pop quiz: at what point in the context length of a coding agent are cached reads costing you half of the next API call? By 50,000 tokens, your conversation’s costs are probably being dominated by cache reads.

These are two different cost components, and the one you bring up is minor, OP is talking about a cost that at 1M output tokens, would cause the cost to be 20x per token. You are talking about a cost that at 1M output tokens would cost 1.5x, different things.

The first is an imperfection of the API encapsulation, the latter may be a natural cost phenomenon related to the internals of the state of the state of the art algorithms


What are you talking about? The cost is quadratic in total conversation length in tokens.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: