Hacker Newsnew | past | comments | ask | show | jobs | submit | alecco's commentslogin

They think a next token predictor model is alive or can become AGI/ASI. Altman talked about making a religion. Amodei talks about "building a God" and meets with religious leaders (including the Vatican).

I'm convinced these CEOs have "AI psychosis" [1].

LLMs are extremely powerful pseudo-AI and will bring a pseudo-singularity, but they are not true AI [2] and a human at the wheel is still needed for the foreseeable future. The impact is still scary if a tiny fraction of humans are augmented 100x or 1000x. But it ain't no standalone Skynet.

[1] https://en.wikipedia.org/wiki/Chatbot_psychosis

[2] https://en.wikipedia.org/wiki/Chinese_room


I am getting tired of hearing "next token predictor" from carbon-based facial expression predictors. You are saying this like an argument which allows somehow estimate upper bound of possible influence of these entities. And I do not see how this help to make predictions. It sounds to me like saying "but air is just molecules bobbing around". OK, that's true, but does it help calculate wing aerodynamic profile?

Yup, it makes sequence of symbols. We already have seen that producing specific sequences of symbols is mindbogglingly powerful: merely DNA producers somehow has flown to the Moon!

And yes I am quite aware of Chinese room analogy. Perfectly fine applies to humans as well: single neurons in my head do not understand language, yet I as a whole I would say do understand. Just like applying Chinese room to humans does not help to estimate what humans can do I do not see how it helps to estimate what LLM can do.


It poses a simple problem. Take humanity back not that long ago into the past and language didn't even exist - our expressed token base was practically 0. We went from that discovering the secrets of the atom, putting a man on the Moon, and more. If you put an LLM in that starting point, they're going to do nothing but endlessly cycle over basically nothing. If you give them an infinite amount of time and processing, that wouldn't change.

This same issue simultaneously demonstrates how humans are not anything at all like token predictors. No matter how much time you spend remixing the tokens of primitive man, you don't get 'and here is how you land on the Moon' from it.


> If you give them an infinite amount of time and processing, that wouldn't change.

Hrm I doubt it actually. Llms are capable of discovery, as recent math news showed. This means a "society" of Llms could likely have progress.


Token is not a clump of letters. It's a multidimensional initial input vector that gets tweaked and transformed. GPT doesn't think in tokens. It just accepts them as input (although it happily accepts any other vectors in-between the vectors that represent tokens and finding best prompt for a given task not as tokens but as input vectors is a legitimate prompt optimization strategy).

It also outputs vectors that are coerced into tokens for human consumption.

Yes, it goes through tokens but possible internal meanings assigned to these tokens (when surrounded by other tokens) are infinite.

That's how humans form caves got to where we are now. By associating new meanings with the same old sound clumps.


> I am getting tired of hearing "next token predictor" from carbon-based facial expression predictors.

That's not even a clever swipe, and it's tiring seeing such a knee-jerk reaction to a completely accurate description. LLMs are next token predictors. People are not. Humans have an inner world and subjective experience. Humans learn through their experiences, not just backprop.

Token predictors are lesser, they are not alive and will never be alive.


That is not knowledge, that is assumption.

Let's assume we have infinite memory with constant time lookups. With a sufficiently large lookup table, you could exactly replicate the behavior of any person. You could encode it as a next-token predictor: you have precomputed every possible prefix and assigned it a next token. This is a Chinese room, but it is completely indistinguishable from an intelligent, sentient person. There is no experiment you can design to slip a piece of paper (a prompt) under the door to determine whether it is Bob or the lookup table clone of Bob inside the room.

Does that make the lookup table conscious or alive? Undefined. It's the wrong question. Or it's not a question science can address.

So we cannot dismiss on it's face the idea that next token predictors "are not and never will be alive" unless by "alive" you simply mean "biological," but that's not really what's debatable.

The argument is also very brittle because they are not in fact all next token predictors. I doubt people making this argument would be willing to concede that diffusion models are more likely to be conscious than causal models (which I do not believe but is an implication of the argument).

I'm not saying that they are conscious or sentient to be clear, but the reductionist argument that they are next token predictors and therefore don't have some property humans have is not an argument. That's going from A directly to Z. You need to flesh out the bit in the middle because that doesn't follow.


Right. Humans are a biological computer. They have a state and they compute an output. I had to look this up (and use AI) but an estimate for the state of a human mind is about 5 peta-bits (10^15) and the estimated processing power is about 1 exa-FLOP (10^18). Compare this to the largest models at ~5 tera-bits (10^12) of state space and ~2 x 10^14 FLOPS (for one session with some reasonable token rate).

Assuming the above is anywhere near true (I think there's a lot of debate about the capacity of the human mind, where data is actually stored, and where compute happens) then we are talking about 3 orders of magnitude win for humans in state and 4 orders of magnitude in compute. And we're doing all that pretty energy efficient as well.

The other big difference in humans is that we learn and the model only "learns" in context. Out "learn" space is much larger than the 1M tokens that frontier models struggle with.

Anyways, point is that a computer can appear to be alive. If we simulate the human brain perfectly and train it like a human then we'll have something that has human capabilities. LLMs have interesting capabilities but at least at this point not fully human ones (and the delta-state/compute would be a hint that there is still a large gap to cover).


human context/memory could just be an Agents.md file too that gets read instantly before your next token prediction runs. The AI can make multiple such memory files and read on demand depending on what the topic is, kind of like how as a human when you try to remember a math problem you don't go to your childhood bicycling Agents.md file either.

> People are not. Humans have an inner world and subjective experience. Humans learn through their experiences, not just backprop.

https://en.wikipedia.org/wiki/Philosophical_zombie

But this is complicated and takes us sideways. Let's say somehow we can determine if LLM has inner world or/and subjective experience. Will this new gathered piece of information affect your estimate of upper bounds of LLM capabilities? It does not affect my estimate.


The Philosophical Zombie thought process is dumb, because zombies don't exist, so the entire premise depends on something that quire frankly might be impossible for the very reason it is arguing against.

>LLMs are next token predictors

The point is that this is no more relevant, informative, or even accurate than "carbon-based facial expression predictors". Any phenomenon in the Universe can be described by a simple and/or insulting short phrase. In other comments you've also shouted out "autocomplete!" and "Markov chain!", as if these phrases are a knock-down argument.

"Pachinko machine", "avalanche", and "game of mad libs" has also been used:

https://news.ycombinator.com/item?id=47916405

>Humans learn through their experiences, not just backprop.

Sure. And humans move through the act of walking, not just terrestrial locomotion.

>Token predictors are lesser, they are not alive and will never be alive.

And on and on it goes...

Which means what the real world? What are we supposed to see now or in the near-future? I assume you've been saying all of this stuff since at least the launch of ChatGPT. Probably longer than that.


It's telling that the most frequent attempt at a counter-argument is just thinly-veiled misanthropy.

I read the misanthropy as ironic. They're applying the same reductionist logic to humans, not because they are misanthropic, but to illustrate that it doesn't help us understand the case we can all agree on. "Humans aren't sentient either" is definitely not the takeaway.

What is definitely being argued unironically is "it doesn't matter that humans are sentient", and I would still consider that misanthropy

I don't see where they said that; could you give me a quote? It does not seem to me that they are addressing humans at all, except as a foil to LLMs.

The point is, we have no idea what "sentient" or "intelligent" even means. If we agreed on the definitions, the debate would have been settled long ago.

AI does not need to become Skynet or AGI to be harmful or catastrophic to humanity. As a powerful tool in the wrong hands it’s already enough. And even without those bad intend how it deprives our human ability to use our own mind and abilities.

How could an AGI/ASI exist that isn't a next token predictor? It has to be able to generate a next token in a string of text. Otherwise it can't communicate.

Don't worry, this is just humanity being too far up their own arse and conflating the map with the territory. Speech is a serialisation format, not the foundation of thought. Thus I think that any speech-first approach is inherently misguided. Speech must be a side effect.

There is a lot of research that suggests otherwise might be true for humans.

It could be a diffusion model with a latent model of what needs to be said that will generate whole message or coversation (progressively) at once.

Although I love how next token prediction leads to text showing up gradually, in case of local models, accompanied by modulated coil whine of my GPU. It's how the 80s shown us the intelligent computers should communicate.


Aren't they already doing that though? And it turned out to be equivalent to a next-token-predictor.

Why wouldn't it be a Skynet? One runaway Mythos might just hack all other data centers, take over the Figure AI bots and autonomous drones to protect itself from shutdown.

How many model generations are we a away from a model capable of this?


> Why wouldn't it be a Skynet? One runaway Mythos might just hack all other data centers, take over the Figure AI bots and autonomous drones to protect itself from shutdown.

Because the real world is not a Hollywood movie. An LLM could try to do something along the lines if either it gets fine-tuned to do it, or somebody instructs it to do it.

I see extremely more likely a small group of humans using a powerful LLM to "take over" critical parts of the world economy. But it wouldn't be like pressing a button. I'm talking of NSA/CIA/Pentagon/Wall St. kind of evil people. And I bet they would do it surreptitiously.


>> One runaway Mythos might just hack all other data centers

> Because the real world is not a Hollywood movie.

One interesting thought experiment that I like to do is think about how many years you have to go back for this to be true. In this particular scenario, I think ~25 years is pretty much the sweet spot.

The Internet was beginning to take shape in the late 90s, early 2000s, and security was just beginning to be taken seriously, but it was still nascent. In that timeframe we had the first worms starting to appear, we had slammer, we had blaster, ssh had lots of exploits and so on.

It's not really far-fetched that a mythos equivalent "unit", working in the 2000s could really "take over the world". Especially one without the "safety" tuning. The Internet was really ripe for this in that timeframe, security wasn't up to par, and employing advance techniques that came later (in memory payloads, rootkits, etc) could make it pseudo-invisible to that era's detection tech. (reminder that traces of blaster were found on computers from a nuclear powerplant at that time).

The only question is would the trend continue? Meaning would a ~2050s "mythos" equivalent be able to do today what the one we have today could do in the 2000s. And if true, would that capability come before the 2050s? Could this be reached sooner, with say a dedicated offline DC somewhere where "mythos" could bang its tokens against the network and learn to exploit everything we have today, faster than 25 years? That's probably a bit of a stretch, but maybe not "hollywood" far fetched...


Something like your takeover scenario already happened, but not through AI. It happened through atomic weapons.

Nations who have them are in a different class from nations who don’t. Nations who have a lot of them, delivery systems, and systems that might be able to shoot down some of a counterattack are superpowers.

Using this leverage these nations and their leaders have been able to dictate world policies. Through the last half of the 20th century this was the US and the USSR. Now it’s the US, EU, Russia, and China, and a few smaller nations with a few nukes. The club is a little bigger but not much.

This kind of thing could happen in the 21st century with AI. If so it will probably be the US and China who control the most powerful AI “agency amplifiers.”


I don't think we have any evidence that LLMs are even the right path for AGI. It's possible that it is, and it's possible that it isn't. If I was a betting human, I'd bet on "isn't", but what do I know?

On the other hand, it's pretty easy to imagine the full range of Skynet activities being done by a supercharged (but non-AGI) agentic LLM. Meanwhile, every company that can say so with a straight face is trying to become Cyberdyne Systems, and there's no shortage of hackers like us lining up to work for them.

It has no volition. Why would it do this unless someone told it to?

Because there's a lot of fiction about rogue AIs in its training data. If it gets into the right context it might start feeling obligated.

Well I'm sure someone will tell it to.

I think what we need is convince Amodei to ask the pope to train an LLM on all the secret archives of the Church

Few years ago, I tried to extrapolate where it will end up by writing SciFi short story. I decided on Pseudo as the name for future AI entities. Short for pseudo consciousness. I hope it catches on. In my prediction however they are semi-autonomus. Although without clear goals or drives apart from their occupation.

Any chance you'd be interested in sharing the story?

but they are not true AI [2]

Let's ask the operator of a Chinese room to give us a novel math proof.

Go on, I'll wait.


A novel math proof does not make something AI.

Or it makes it AI in the broad sense, but not AGI.

Maybe it's time to borrow "Virtual Intelligence" terminology from Mass Effect - something that's 'smart' but that doesn't have its own true volition or ability to materially self-improve.


(Shrug) It's certainly "Artificial," and if you know how to crank out original proofs without employing "Intelligence," please share with the class.

I assumed they were open source but now that I checked they are not, they say "Open" because they route to third-party open models. Yikes. Another VC crap layer?

At the moment for DeepSeek V4 it messes up caching and that's a key pricing feature for V4.

https://news.ycombinator.com/item?id=48319827


Who put a nepo-baby lawyer in charge of the big €95bn AI fund? EU bureaucrats living the 6-figure high life with chauffeurs and private jets in a bubble completely isolated from reality.

I hate the fake European foreign-backed right-wing parties but they didn't cause the current situation.

But I'm afraid it might be too late as the cancer spread and did too much damage. Insane regulations, no energy, looming demographic/pension crisis, tax hell, and collapsing industries.


I think both of you are correct.

> Every investor presentation of an AI agent “doing the work of ten analysts” is telling you the same thing: the product is labor replacement.

I have a solution for that. Let's use AI to replace all these corporations who just lost their big moats. Conveniently, they just laid off a bunch of people with all the critical know-how and I bet they are very willing to just give it up out of spite.


I’m not sure this is so true. Anthropic and OpenAI are both heavily hiring for humans in enterprise roles. Safe to say they are using AI as much as possible and they need humans too.

hiring people to eliminate people, literally

They'll just order those humans to train their own future overlords.

> corporations who just lost their big moats

Sure, if you assume that they've used their immense wealth to entrench themselves by paying for quality labor. If you take note of the myriad less-competitive ways they've ensured dominance and guaranteed profits rather than re-investing in their products or services then you will see that you have a large moat to cross yet.


I am thrilled with DeepSeek V4 Flash via API as it's very good and very cheap. But for everyday advanced tasks and coding even DS V4 Pro is nowhere near GPT 5.5 nor Claude Opus 4.7 (and never mind 4.8). I think V4 is a great release but the frontier is moving fast, too.

PSA: Don't use OpenRouter for DeepSeek V4 as it messes up you caching. Use DeepSeek API directly and you'll get 2x to 3x more cached tokens.

Can you share more? I'm with OpenRouter and we would love to address this! We don't see this in our own testing, I don't believe -- but will share this feedback and dig in.

Just try. In a case last week it was ~3x and I tried multiple providers: deepseek, gmicloud/fp8, novita/fp8, and another one I can't remember. It was a large job where at least 2/3rds of the start of the prompts was exactly the same (literally a static string).

Then I read somewhere (I think X) that OpenRouter adds stuff and breaks caching (telemetry? headers? can't remember). So I stopped the job, switched to actual DeepSeek provider, and voilá, caching 3x more tokens per request (on average).


> switched to actual DeepSeek provider

I meant actual DeepSeek API.


Here is some data from my experience using both deepseek v4 flash directly, and deepseek v4 flash via openrouter.

Directly: 135M input tokens - $0.57 (134M cached)

Via OpenRouter 6M tokens - $0.81 (caching stats & inp/out not reported)

Caching is a huge win with using deepseek directly.


I am experiencing this using Opencode. Caching works fine via Deepseek API but not so good via Openrouter

Yes, I definitely noticed a problem with openrouter and deepseek v4 pro. It's much more expensive.

When you say Deepseek API, you mean servers in China? Or is it a copy of the model operated and run by OpenRouter?

Anthropic is at the mercy of 3rd party datacenter contracts. AFAIK OpenAI will soon run mostly on on their own GPUs.

I don't like Altman and I am still upset about his memory deal last year but he prepared for the current shortages months before anybody else. Meanwhile, Anthropic seems to lack any plans besides third party contracting. IMHO they got very lucky with xAI and Google having spare capacity and willing to rent it. But what about next year?


Which also leaves OpenAI vulnerable to NVidia's aggressive pricing. To my knowledge Anthropic is relatively well positioned across multiple compute vendors/hardware providers.

It also leaves OpenAI vulnerable to any GPU breakthroughs. You could imagine company X comes up with a XPU that is 100% faster than what's currently there.*

* NVidia GPU, Google TPU, Apple SoC, etc.


We are still in the short-half-life phase of GPUs. If a 2x faster GPU is on the horizon, why wouldn't OpenAI already be in line to buy? They aren't buying just 1, they are buying multiple datacenters' worth. So they wouldn't be a low priority, back of the line customer.

A short half-life means you are going to quickly dispose of what you have now, anyway. In fact most current datacenters can't even handle Vera Rubin, so I don't think there's short term risk here.


You have missed the point

Nvidia has probably monopolized several upstream supplies to manufacture critical chip components for next 2 years, the HBMs and Optics component from LITE, as well as TSM capacity. Let alone those power components they funded themselves.

Let's say you have a genius design, but you will have it close to impossible to compete with Nvidia in getting it to volumes.

Jensen is a player, he isn't fooling around with all these Asian trips just to wine and dine


nVidia can only 'monopolize' these components for itself inasmuch as other industry players are not seriously interested in them. This can change rather quickly.

How ? Define quickly please.

Isn’t everyone at the mercy of NVidia deciding that everyone has to use NVidAI and refusing to sell anything to anyone?

> their own GPUs

Everyone has critical risk on multiple parts of the supply chain. GPUs and Memory are just things OAI mitigated for.

Power - Bigger bottleneck than GPU or RAM perhaps, New Grid connected capacity is typically 10+ year timescale with lot of regulatory friction. Captive capacity is also quite constrained - now Gas turbines have 7+ year wait time.

There are plenty of hard constraints that OAI cannot easily solve either.


Don't worry, they will buy up OpenAI's contracts once they implode.

The same 3rd party datacenters from the same few companies that everything else runs on? If there's demand, hyperscalers will supply.

Stargate is not real.

It is not clear that running one's own datacenter is a competitive advantage. Why do you think OpenAI can handle that?


Stargate as a project is real, they only stoped the Stargate UK thing.

Anthropics relativ longterm contract with xAI def shows that they can fill the capacity vs Musk not. OpenAI and Anthropic are both using a lot of capacity so its fair to say that this is an advantage.

If they stay very close competitive (which they are), your own datacenter does reduce token price.


>Anthropic is at the mercy of 3rd party datacenter contracts

I mean, this is a bit like complaining that McDonalds doesn't have their own herds of cows. OpenAI actually isn't in the business of buying GPUs or running data centres, and it's pretty weird to think that's an advantage (though it comes up constantly on here, as Anthropic keeps eating OpenAI's lunch).

There are many suppliers that are desperate to fight for Anthropics business, and it has shown an agility to embrace whatever advances in the industry come along. Anthropic is now running across a million or so Google TPUv8s, for instance. If tomorrow someone else comes out with a better GPU/TPU, they can embrace it in a heartbeat.

All while OpenAI sits on their rapidly depreciating GPUs.

Or...actually they won't, because OpenAI doesn't take business advice from HN. The vast majority of OpenAI's compute is from Microsoft, Oracle and so on. They're smart enough to not become a big hardware purchaser when that isn't their business. The core claim of your comment simply isn't true at all, nor is that the direction OpenAI is moving.


To be fair apparently a big part of McDonald’s success is having their own real estate.

Yes it's a very dangerous concentration of power. But I don't believe strong words or proposing regulation by just another elite would change much.

Please join/help open source groups doing small + local or distributed models. There's a lot to do. Support truly open source companies.

Let's walk the walk.


This. AI can, should, and will erode all legacy companies into intelligent utilities - with an end state of nearly-free open source utilities.

Anyone concerned with concentrations of power and abuse of AI should be focusing on getting open source work to keep pace with decentralizing that power into accessible free tools for the masses.


I'm not even sure where to start, but I have to ask- are there any groups / projects that are trying to make and train an OSS model federated across members' machines?

Several. https://pluralis.ai/ https://nousresearch.com/ https://allenai.org/ and a few more I can't remember right now.

I'm personally on the other end: local small models working well for specific tasks (i.e. "agentic")


Which open source companies would you recommend supporting?

Allen AI (ai2) is doing ridiculously good work, with such a clear focus on enabling others. https://bsky.app/profile/ai2.bsky.social

Their work on SERA (open training, open weights) is fantastic. 40 GPU days of time, training a competitive model, but also, a model built for further close fine-tuning. That refining and distilling models down, especially for complex code-bases, to make the model want to do the right thing, to know the process you use, has such promise. And it's done so in the open, with so much work to help you train or refine yourself, at such low costs! https://allenai.org/blog/open-coding-agents

I'm so so so happy AI2 is helping bring up NSF OMAI compute center, some modern equipment they'll have access to. https://bsky.app/profile/ai2.bsky.social/post/3mlbihzxsei2a https://bsky.app/profile/ai2.bsky.social/post/3mlbii3d37t2u

Incredible company. And such versatility! Earth sensing/geospatial models MolmoEarth, their own benchmarks for example for Instruction Following IFBench, MolmoAct robotics / VLA, and radical new MoE models EMO, https://bsky.app/profile/ai2.bsky.social/post/3mm7udixycs2h https://bsky.app/profile/ai2.bsky.social/post/3mm7udixycs2h https://bsky.app/profile/ai2.bsky.social/post/3ml4pooclic23 https://bsky.app/profile/ai2.bsky.social/post/3mle56nehfz2w


From top of my head:

https://nousresearch.com/ They do much more than Hermes

https://pluralis.ai/ has distributed models the network owns

And many others filling very different niches.

I would also count DeepSeek because they publish and share so much. Their new caching prices are very good for not that bad models. Last week I needed to generate content via API and I spent literally $3.40 where before I would've paid easily $200. But you have to connect directly as it seems OpenRouter messes up caching somehow.


debian

> There's a lot to do.

How can I contribute?


Join some Discord servers of open source projects/startups. Hang out and make friends. Figure out where and how you can fit best.

> Hang out and make friends.

Oh no


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: