Discussion about this post

User's avatar
JP's avatar

The confidence-based escalation pattern maps neatly onto what's happening at the market level. OpenRouter does something similar across 300+ models, routing 30 trillion tokens monthly by matching the right model to each request. I reckon the companies that nail this routing early will have a real edge as agent workloads scale up. Agents don't care which model responds; they care about cost per outcome. Wrote about the macro economics of this recently: https://medium.datadriveninvestor.com/who-profits-when-ai-models-are-free-b71ae03f4167

Klement Gunndu's avatar

The nuance in "Most companies running AI in production are overspending on inference — and they don’t know it" is something most posts on this topic miss. Saving this for reference. The distinction you draw here is exactly what teams need to internalize before scaling.

1 more comment...

No posts

Ready for more?