Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

17k TPS is slow compared to other probabilistic models. It was possible to hit ~10-20 million TPS decades ago with n-gram and PDFA models, without custom silicon. A more informative KPI would be Pass@k on a downstream reasoning task - for many such benchmarks, increasing token throughput by several orders of magnitude does not even move the needle on sample efficiency.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: