Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
gundmc
8 months ago
|
parent
|
context
|
favorite
| on:
Ask HN: How can ChatGPT serve 700M users when I ca...
Well, their huge GPU clusters have "insane VRAM". Once you can actually load the model without offloading, inference isn't all that computationally expensive for the most part.
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: