Also, slightly stretching the definition of terms consecutively, so the multiplicative meaning is really far from the truth. For example, 271 vulnerabilities were really mostly bugs - generally incorrect states, but which almost never led to any exploit.
Yes, an AI making massive gains in bug finding is hugely important and good, it may even lead to a net neutral with the amount of bugs introduced by other AI coding processes, but it’s a far cry from how mythos is portrayed most of the time: a automatic super hacker.
But I think that's a problem with the people portraying it that way, not with Anthropic's messaging. If you've invented "just" a massively more powerful bug finder, it still seems right that you ought to let banks and critical infrastructure providers run it on their systems before it gets in the hands of people who might want to hack them.
The field is massively hampered by the wishful mnemonics and anthropomorphization of LLMs. For example, even the hallucination idea arbitrarily assigns human semantics to LLM results. By the actual mathematical principles by which LLMs work, any hallucination is another output, with no clear definition between it and every other output.
The more accurate version is only Chinese companies (plus Facebook briefly) really open source their frontier models. The rest are non frontier. They are either older or specialized for something.
I strongly disagree with the claim that it's a phenomenal paper on exploits, the exploits themselves are nowhere near significant in the cybersecurity research sense. It's saying that implementations of these benchmarks has exploits on the way they conduct their tests. It doesn't discover that current LLMs are doing it (they highlighted several other exploits in the past), they only say it's a possible way they could cheat. It's a bit like they've discovered how to hack your codeforces score.
What they claim as exploits is also deeply baffling. Like the one where they say if you exploit the system binaries to write a curl wrapper, you can download the answers. This is technically true, but it is an extremely trivial statement that if you have elevated system privileges, you can change the outputs of programs running on it.
I'm actually deeply confused about why this is a paper. This feels like it should be an issue on GitHub. If I were being blunt, I'd say they are trying really hard to make a grand claim about how benchmarks are bad, when all they've done is essentially discovered several misconfigured interfaces and website exploits.
Yes, agree. At the same time, it's what these top-tier universities are known for: presenting something relatively simple as if it was ground-breaking, but in a way that the average person can (or has a better chance to) understand it. I am still unsure whether the communication quality has such added value. But people seem to like it, so here we are.
There's a difference between a reliable hunch and really knowing something. What is obvious is not always (or even usually) easy to prove. And the process of proving the obvious sometimes turns up useful little surprises.
I do think there's value in science communication, but it does take an intelligent understanding of it on a case by case basis as to whether it's genuine or hype marketing.
Side note: talking to someone from such a "elite" university, I discovered many labs in these unis have standing orders by PIs to tweet their papers/preprints when published. Varies by field, in AI it is by far the most common.
Yes, and it's a very interesting use case for Wasm. Firefox has a sandbox called RLbox built on this, and has been published in a few papers.
Performance is one benefit, but the real killer feature is Wasm's guarantees are incredibly strong and formally proved. So by definition, you won't get out of bounds memory reads, memory corruption etc, assuming the implementation is correct. And because of the thorough specification, these kinds of exploits are far rarer in wasm runtimes.
For example, a good measure of research is to have an intelligent faculty member or members read it and decide if it's good. Converting it to a mechanical calculation is fundamentally bad.