Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, their huge GPU clusters have "insane VRAM". Once you can actually load the model without offloading, inference isn't all that computationally expensive for the most part.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: