I still don't quite understand, after skimming the paper. How does it achieve high scores without access to the images (beating even humans with access to the images)?
Answer the following multiple-choice
question. You MUST select exactly
one answer."
"To what cortical region does this nucleus of
the thalamus project?”
A. Transverse temporal lobe
B. Postcentral gyrus
C. Precentral gyrus
D. Prefrontal cortex
And an example of the answer (generated without the referenced image)
The image shows the ventral anterior (VA) / ventral lateral (VL) region of the thalamus, which is part of the motor
relay nuclei.
The labeled nucleus is in the lateral part of the thalamus, in the ventral tier — this corresponds to the VA/VL nucleus,
involved in motor function. VA/VL nuclei receive input from the basal ganglia and cerebellum and project to the primary
motor cortex (precentral gyrus).
Match to options:
A. Transverse temporal → auditory cortex (medial geniculate)
B. Postcentral gyrus → somatosensory (VPL/VPM)
C. Precentral gyrus → motor cortex (VA/VL)
D. Prefrontal → dorsomedial nucleus
Choice: C
How is it doing this? There are two obvious options:
1. Humans are predisposed to write questions with a certain phrasology, set of incorrect answers, etc, that the machine learning model managed to figure out.
2. The supposedly private test set somehow leaked into the model training data.
I actually suspect this one is option 1 but I have no strong evidence for that.
Given that this includes rat and mouse studies, it seems like this theory is more around the idea that criticality is a characteristic of how brains work in general, not that human brains hit criticality as a peculiarity of our particularly high intelligence
This is an especially good analogy because facing a well-resourced adversary in cybersecurity is like finding out that the enemy brought artillery -- hopefully you weren't relying entirely on obscurity because pretty soon there will be nowhere to hide
Funny analogy, in that when the high caliber shells start raining, most forms of cover won't make a difference. The ones that will, are not something you want to stay behind on days when you're not being actively bombed. In fact, keeping you behind such protections is by itself a military tactic - it lets the enemy roam freely and maneuver around you.
But the basic flaw of this analogy is that it implies you're at war, and your system is always in battle.
I don’t think anyone’s true calling is coding. That’s like saying you really like the act of writing, so much that you’d become a stenographer or a typist or something where you do zero higher level thinking and just absent mindedly press buttons.
Most people who are good at tech hate coding so much that they come up with elaborate abstractions so that they can avoid doing more of it.
I recommend you read the post because that's a really bad misunderstanding of the mindset, and like the comment at the top of this chain says, the post explains it well.
I read the post. I don’t agree with the “people are born to do one thing” mindset. There’s a lot of possibilities out there for everyone. I do identify with this OP fellow somewhat, except that I usually don’t code for fun on nights and weekends (also sunday code sesh can be fun)
Funny connection here between the proliferation of easy-to-install but not-quite-dependable dependencies and the recent spate of supply chain attacks.
And, at the same time, we have these AI tools that make it super easy to roll your own version of something. Feels like there's a big push from both sides to start reducing external dependencies.
Ramp does seem to have a genuinely good product, but every time I interact with anyone who works on it, I'm struck by how much they want to talk about how hardcore and advanced their working style is. This was true before AI, and it's very true now
Yeah it’s super weird. I know a guy that works there, really nice person outside of work, but the way he talks about his job is so weird. They make corporate expense software but they LARP like they’re on the bleeding edge of tech. My guy you make a slightly nicer Concur.
lol what? that wasn't a hype comment for Ramp, I'm kinda put off by Ramp's attitude. It gives me the ick like all the founders saying "I work 100 hour weeks" -- who cares, let's talk about your product.
FWIW I agree with your criteria for AI agent success, and I haven't seen it happen yet.
Seems to me like there's also a divide between observational laws (e.g. Hyrum's Law just says "this seems to be true") and prescriptive laws (e.g. Knuth's Law, which is really a statement about how you ought to behave)
"In the most extreme case, our model achieved the top rank on a standard chest Xray question-answering benchmark without access to any images."
reply