Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That was then, this is now. The new models are scarily good. If you're skeptical, just take an hour to replicate the strategy the article references. Point Claude at any open-source codebase you find interesting and instruct it to find exploitable vulnerabilities. Give it a well-defined endpoint if you want (e.g., "You must develop a Python script that triggers memory corruption via a crafted request") and see how well it does.


> That was then, this is now.

No, what we were seeing with curl was script kiddies. It wasn't about the quality of the models at all. They were not filtering their results for validity.


It was definitely partially about model quality. The frontier models are capable of producing valid findings with (reasonably) complex exploit chains on the first pass (or with limited nudging) and are much less prone to making up the kinds of nonsensical reports that were submitted to curl. Compared to now, the old models essentially didn't work for security.

If those script kiddies had been using today's models instead and _still_ didn't do any filtering, a lot more of those bugs would have been true positives.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: