It took almost two centuries for broad middle-class living standards to become common for large population after Industrial revolution - and that happened after intense fight for rights and fair share of economic gains.
So, sure, AGI might raise equality - but that's only if we fight for it.
Hahaha this hits home too hard, back in early 2000s people would moan all the time whenever they spotted a hint of autotune, in 2026 its the industry standard.
I think its really speaks on the incredible ability of people to be able to be stuck in the past rather than new technology being "bad".
This is an amazing comment. I'm old. I was born in the 70s, grew up in the 80s and 90s and miss those times so much. But that is because I was young, immortal, the world was mine to discover.
In 20 years people will be missing the 2020s too. It is just human nature to complain.
That's just not accurate. I haven't studied SWE Bench Pro in detail, so I can't tell you exactly what the flaw is, but SOTA models routinely make bad architectural choices I have to intervene to fix.
TL;DR its very effective as it directly tests model on REAL codebases: "The benchmark is constructed from GPL-style copyleft repositories and private proprietary codebases". The use case is very real.
It doesn't sound to me like this benchmark is attempting to measure architecture design. As far as I see in the paper, they do not evaluate the architectural quality of a task completion, only whether the model is capable of completing it at all.
Could you elaborate? I hear some people say a big model should be driving a smaller model, I hear some people say a small model should be driving a bigger models.
When I have an expensive task that is clearly defined, I will get opus to write an LLM workflow for it, and then I will execute it with a smaller model. (Starting with the smallest one, and then upgrading if the task fails.)
But this is a single well defined task, designed by me and Opus in concert. If I need ongoing agentic work, Opus would be too expensive. I'm not sure if Haiku is big enough to be the driver yet. And Sonnet is probably too big! Haha.
(Grok looks promising, optics aside... Grok 4 Fast was almost there but not quite. Great for interactive / realtime (steered) work though.)
But I'm thinking you need a smallish model which can delegate both up and down. I'm not exactly sure what that looks like though. Cause the model needs to be big enough to know that it's struggling... Instead of pattern matching to something stupid and getting stuck in a loop trying to solve it the wrong way.
All of the major model's memory are handled by smaller more specific models.
I do not know about the future, but I believe, like the human brain (the amylgada + cerebral cortex), AGI will have smaller but more specific submodels running in parallel to craft an compelling heuristic.
But, I think, with every revolution, hierarchies have only historically fallen only for the former serfs to rise.
The industrial revolution, the renaissance -> all were marked by an massive shift in the socioeconomic status and the rise of the middle class.
I think AGI, when it happens, will only raise equality. I may be wrong.