Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think this is true, llama.cpp hobbyists think about this a lot and there's been multiple independent community experiments, including blind testings by a crowd. I doubt it holds across models and modalities, but in llama.cpp's context, Q5 is inarguably unnoticeably different from F32.

However this seems to be model size dependent, ex. Llama 3.1 405B is reported to degrade much quicker under quantization



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: