100% agree here. The actual practical bottleneck is harness and agentic abilities for most tasks.
It's the biggest thing that stuck out to me using local AI with open source projects vs Claude's client. The model itself is good enough I think - Gemma 4 would be fine if it could be used with something as capable as Claude.
And that's gonna stay locked down unfortunately especially on mobile and cars - it needs access to APIs to do that stuff - and not just regular APIs that were built for traditional invoking.
The same way that websites are getting llm.txts I think APIs will also evolve.
It's not worse, Anthropic simply has no equivalent model (if you don't consider Mythos) of GPT 5.4 Pro. Google does though: Gemini 3.1 Deep Think.
GPT 5.4 Pro is extremely slow but thorough, so it's not meant for the usual agentic work, rather for research or solving hard bugs/math problems when you provide it all the context.
I'm genuinely asking, when you say Gemini 3.1 DT is an equivalent model of GPT 5.4 Pro, is there a specific benchmark/comparison you're referring to or is this more anecdotal?
And do you mean to say that you don't really use GPT 5.4 Pro unless it's for a hard bug? Curious which models you use for system design/architecture/planning vs execution of a plan/design.
TIA! I'm still trying to figure out an optimal system for leveraging all of the LLMs available to us as I've just been throwing 100% of my work at Claude Code in recent months but would like to branch out.
I haven't seen anybody else post it in this thread, but this is running on 8GB of RAM. It's not the full Gemma 4 32B model. It's a completely different thing from the full Gemma 4 experience if you were running the flagship model, almost to the point of being misleading.
It's their E2B and E4B variants (so 2B and 4B but also quantized)
The relevant constraint when running on a phone is power, not really RAM footprint. Running the tiny E2B/E4B models makes sense, this is essentially what they're designed for.
Depends on the phone, I have trouble fitting models into memory on my iPhone 13 before iOS kills the app. I imagine newer phones with more RAM don’t have this issue especially with some new flagship phones having 16+ GB of memory
Between the GPU, NPU and big.LITTLE cores, many phones have no fewer than 4 different power profiles they can run inference at. It's about as solved as it will get without an architectural overhaul.
If people are finding new ways to use AI, they should change how they bill. Banning third party harnesses is bad for a lot of reasons - it looks like they're trying to force people to use their software. Strategically it might make sense - gives them a tiny moat if their models ever slip - but it discourages the breakneck pace of innovation and the long term effect is that their customers (largely highly skilled with computers and building software) will look to decouple themselves. Claude is good but it's not so far better than anything else that they can pull shit like this and people will just deal with it.
They already have the regular subscription plans (Pro, Max) and a separate billing process for direct API usage. They could absolutely introduce another type of plan optimized toward this kind of usage or just accept that it's a dumb pipe that is being paid for and having these random arbitrary limitations is just making things more confusing and a bad plan for the future.
They already have the way that you're supposed to bill for usages like this, the API usage. The purpose of the subscription plan is strictly for the cases where you are using few enough tokens on average that it's not a money pit for them.
They have subscription plans for their software, and a seperate billing process for the API. There's nothing to change. 'Accepting that it's a dumb pipe' would just mean removing the Pro & Max plans as options.
Clawdbot was clearly against the Consumer Terms of Use the whole time, they’ve just started actively detecting and blocking it.
> Except when you are accessing our Services via an Anthropic API Key or where we otherwise explicitly permit it, [it is forbidden] to access the Services through automated or non-human means, whether through a bot, script, or otherwise.
Claude Code is a subscription tier explicitly designed for agentic, automated, heavy usage. So the 'subscriptions are for human use, API is for automation' line is already blurry by their own offerings.
If the actual concern is use pattern, enforce that directly. What we have instead is metered usage + behavioral restrictions + product fragmentation across three separate offerings.
That's not a clean billing philosophy, it's layers of control stacked on top of each other with no coherent logic tying them together.
If subscriptions are for humans and API is for automation, fine. But then don't meter the human product arbitrarily and don't sell a subscription tier for automation while also restricting automation. Pick a lane.
> Claude Code is a subscription tier explicitly designed for agentic, automated, heavy usage
Except it's not. It's a desktop, web, mobile, and CLI subscription product built on top of a usage-based API with a generous token allowance bundled with it. That generous allowance comes with the restriction that those tokens can only be spent through Claude product surfaces. Why would Anthropic offer their API at a loss and subsidize the profits and growth of other businesses?
I feel like Anthropic is going down a bad path here with billing things this way. Especially as local LLM continues to develop so fast.
I downgraded from my $200 a month plan to my $20 plan and hit limits constantly. I try to use the API access I purchased separately, and it doesn't work with Claude Code (something about the 1 million context requiring extra usage) so I have to use it Continue. Then I get instantly rate limited when it's trying to read 1-2 files.
It just sucks. This whole landscape is still emerging, but if this is what it's like now, pre enshittification, when these companies have shitloads of money - it's going to be so much worse when they start to tighten the screws.
Right now my own incentive is to stop being dependent on Claude for as much as I can as quickly as I can.
This is how free drink refills, airplane tickets, Internet service, unlimited data plans, insurance, flat rate shipping, monthly transit passes, Netflix, Apple Music, gym memberships, museum memberships, car wash plans, amusement park passes, all you can eat buffets, news subscriptions, and many more work.
Either you get a flat rate fee based on certain allowed usage patterns or everyone has to be billed à la carte.
This is a different case - those all have limitations based on human behavior (it's not necessary or possible to constantly be washing your car the entire month when you pay for unlimited washes) - that doesn't exist here. The types of plans available should reflect that reality. If gyms faced a situation where people would go and spend 18 hours working out every day for a month, they would probably change how they billed things.
Your comparisons are all also "unlimited" situations to Claude's very much limited situation. You can't buy a plan for Claude that is marketed as being unlimited. They're already selling people metered usage. They're just also adding restrictions on top of that.
They sell metered usage while having the implied expectation that most wont use it fully. Power users and users of stuff like OpenClaw don't match that idea.
So they further restricted the metered caps, which were only offered to NOT be reached by that many.
Because a big part of Anthropic's story is that they build based on how people actually use AI. Power users aren't just annoying edge cases, they're signal. Throttling them and calling it done is inconsistent with that.
> Power users aren't just annoying edge cases, they're signal.
Not all power users. Some re-invent the wheel and/or do things inefficiently, and in most cases there's no business incentive to adapt the service to fit the usage patterns of those users, or of other users that deviate from the norm in regards to resource usage.
Sorry to tell you but generally any company's "story" is all marketing and PR, if it interferes with their making money, which it does in this case, that company will not hesitate to leave it behind.
Oh the billion bollar vc backed pre ipo companys story was this? Omg and they somehow are not delivering up to your standards? Damn they better get their act together lest people like you will whine on twitter about them losing their way
I didn't write anything about pricing. I just claim that people would love an offering without the discussed restriction, and because there is clear evidence of such a demand, it would make sense for Anthropic to prepare such an offering.
Yes, and that's exactly the problem I'm pointig at.
Your comment "that people would love an offering without the discussed restriction" ignores the pricing burden of that, which is why it's confused why Anthropic don't just offer this.
"Unlimited" has always been a lie. There is no free lunch. There are always limits.
I've had to unwind "unlimited" within startups that oversold. I've been bit by ISPs, storage providers, music streamers, fuckin _Ubers_, now AI subscription services, that all dealt in "unlimited". None of them delivered in the long run.
I'd be mad at Anthropic if it weren't for the fact that my experience now can see this sort of thing from a mile away. There are a lot folks, even on HN, that haven't been around for as long. I understand the outrage. I've been there. But these computers cost money to run, and companies don't operate at a loss in the fullness of time.
Once you know that unlimited trends towards limited, the real question is whether we're equipped as a society to deal with the fact that the capital-L Labor input to the economic equation is about to be replaced with a Capital input for which only a handful of companies have a non-zero value.
On your 1.5Mbps link, you could theoretically download 500GB per month. A huge amount, but I believe it was often genuinely allowed, because their uplinks could cope with it. Unlimited could genuinely be unlimited.
But now you might get things like “unlimited” 1Gbps… which reverts to 10Mbps (1% speed) or worse after 3.6TB (eight hours). And so your new theoretical maximum is about 6.8TB per month rather than 330TB.
>If gyms faced a situation where people would go and spend 18 hours working out every day for a month, they would probably change how they billed things.
Not the best example. The upkeep cost of a gym is pretty flat regardless of how much people use the facilities. Two people can't use a single machine at the same time make it wear out twice as fast. The price of memberships is not correlated to usage, it's inversely correlated to the number of memberships sold.
Two people can't use a machine at the same time is the issue. If you have 50 machines and 200 customers all of whom want to be in the gym 18 hours per day that's quickly going to lead to cancelled subscriptions. Now you need more space and machines or some other way to balance things.
Agreed, but it's an indirect causal link, not a direct one. If the demand far outstrips the possibly supply the demand will have to go down, and it can either go down by people accepting that they can't be in the gym as much time as they would like, or as you say by memberships being cancelled (in which case the price may go up or something else might change).
>Two people can't use a single machine at the same time make it wear out twice as fast
The machine doesn't care about the number of people using it. If it's constantly being used, it will wear out faster. You are conflating "we price based on expected under-utilization" with "costs don't scale with usage." Those are different things.
The inverse correlation you talk about isn't relevant here - People buy gym memberships intending to go, feel good about the intention, and then don't follow through. The business model is built on that gap. That's pretty specific to fitness and a handful of similar industries where aspiration drives purchase.
Anthropic doesn't sell based on a "golly gee I hope people dont use this" gap - they sell compute. Different business.
> Anthropic doesn't sell based on a "golly gee I hope people dont use this" gap - they sell compute. Different business.
There is nothing anywhere hinting at that.
They don’t sell compute. They sell a subscription for LLM token budgets that they hope people don’t use because the compute is vastly more expensive than what they charge or what users are ever willing to pay.
Especially with enterprise subscription plans the idea is for customers to never utilize anywhere close to their limits.
>If it's constantly being used, it will wear out faster.
Yeah, but there's an absolute limit to that, beyond which the cost doesn't keep increasing. Beyond that point, the QoS goes down (queues).
>You are conflating "we price based on expected under-utilization" with "costs don't scale with usage."
I'm not conflating anything, I'm responding to what you said:
>If gyms faced a situation where people would go and spend 18 hours working out every day for a month, they would probably change how they billed things.
Why would a gym need to change how they bill things if all their customers were aiming for maximal utilization, when their costs would barely see any change? I doubt your typical gym operates on razor-thin margins.
Gym costs absolutely scale with usage. Equipment wears faster under heavier use. Cleaning and maintenance staff hours scale with how much the facility is used. Consumables like towels, soap, and chalk go faster. HVAC runs harder. The reason gyms can offer flat-rate pricing is that they bet on under-utilization, not that costs are flat.
Setting that aside, even if we accept your argument that gym costs barely scale with usage, then that makes gyms a bad comparison case for Anthropic, whose costs directly scale with usage. You can't use the gym model to defend Anthropic's pricing decisions if the two cost structures are nothing alike.
I'm arguing that both gyms and Anthropic have usage costs that scale with usage, but gym business model assumes a large margin of under-utilization and there's a hard cap to "power user" - I think both of those extremes don't apply to Anthropic's situation. Under-utilizers aren't paying for AI they have a free tier. There's also a natural ceiling on how much any one person can use a gym. There's no equivalent constraint on API usage.
> The reason gyms can offer flat-rate pricing is that they bet on under-utilization, not that costs are flat.
Yes. In fact i remember hearing about a gym which offered a flat-rate pricing model but explicitly excluded certain professions from partaking in it. I remember the deal was excluding police, bouncers, models, actors and air stewardesses. They had a separate more costly tier for these people. (And I think i heard about it from the indignation the deal has caused online.)
> Under-utilizers aren't paying for AI they have a free tier.
Sure they do. Free tiers suck. I may not always need to use AI, but when I need it, I don't want to immediately get hit by stupidly low quotas and rate limits, or get anything but SOTA models.
> I feel like Anthropic is going down a bad path here with billing things this way.
What do you expect them to do? You are looking at a business currently running at a loss, and complaining about their billing even though this is not a price-rise?
Unrelated, is it still possible to use $10k/m worth of tokens on their $200/plan?
> Anthropic entered 2025 with a run rate of $1 billion; the run rate for March 2026 is estimated at $19 billion.
I don't know what that means in this context.
> Internal projections show the company reaching cash-flow break-even in 2028, after stopping cash burn in 2027.
What does that have to do with them implementing restrictions on their plans because they are currently running at a loss?
Okay, lets say their internal projections[1] are accurate: were those before or after Openclaw released? Maybe their projections were made on the assumption that people would stop using $10k/m worth of tokens on a $200/m plan? Or that those users doing that will only be doing code? Or that the plan users won't be running requests at a rate of 5/minute, every minute of every hour of every day?
--------------------------------
[1] Where did you find those projections? I'm skeptical, at their current prices and current plans, that a break-even at any point in the future is possible unless they shut off or severely scale down training. Running at a per-unit loss means that the more you sell, the larger your loss - increasing your sales increases your loss.
Well, I reinstalled LM Studio today after some ~10 months since I last used it, just to test Gemma 4. On my PC with 32GB RAM and 4070 Ti (12GB VRAM), it (Gemma 4 26B A4B Q4_K_M) loads and runs reasonably fast, with no manual parameter or configuration tuning - just out of the box, on fresh install - and delivers results usable results on the level I remember expecting from SOTA cloud models 12-16 months ago. And handles image input, too. I'm quite impressed with it, TBH. It's something I can finally see myself using, and yay, it even leaves some RAM and VRAM left for doing other stuff.
Look for the current crop of local Mixture of Experts models, where it seems like they've made inroads on the O(n^2) context attention cost problem. Several folks have mentioned Qwen, but there's many more of that ilk. Several of them actually score really high on benchmarks. But when I mess with one of them locally by hand myself, (I have a 3090), it feels a bit like last year's Sonnet. They don't quite make the leaps of understanding you get from Opus.
You can run SOTA local MoE models very slowly by streaming the weights in from a fast PCIe 5 SSD. Kimi 2.5 (generally considered in the ballpark of current sonnet, not opus of course) has been measured as 2 tok/s on Apple M5 hardware, which is the best-case performance unless you have niche HEDT hardware with lots of PCIe lanes to attach storage to and figure out how to use that amount of parallel transfer throughput.
A ~$5000 USD Macbook can run open source models that are competitive with GPT 3.5 or Sonnet 3. So on nice consumer hardware you can have the original groundbreaking ChatGPT experience that runs locally.
We can hope that they optimize the models. I still think its going to be very hard for them to charge $100 or $200 a month at scale from many people, especially with AI "taking jobs". To the extent that happens most of those people won't find replacement income.
My home charger was like $500 ($300 with the credit I got from electric company) and install was like 250. No upgrade needed.
I've also owned a house before that had old electricity - knob and tube (this was before I had an electric car) and paid less than 10k to get the entire electricity system upgraded to something modern. I dont think your 5k-10k thing is accurate for the vast majority of houses.
I've also found that they have gotten a lot more restrictive with new card offers likely because of the massive influencer industry encouraging churning etc...
I try to keep it simple: I have the $95/year Chase Preferred. I definitely get way more than $95 every year in rewards. It's also helpful for auto converting currency when abroad (the fees add up like crazy if you let individual services do this for you.)
I'm currently in Paris on a flight I booked with points but I use this card basically exclusively for all my expenses (except ones that have to be paid with cash/bank acct)
I recently tried to get one of these higher tier cards like the Reserve just for the lounge access and even though I have near perfect credit and longtime customer they wouldn't give me the joining bonus. It's not worth it overall, especially with these changes.
Idk I switched to Firefox earlier this year and it's honestly been really painless. Not sure why a CAPTCHA would trigger based on browser ID when those are so easily spoofed. Why would someone be running a bot on a less popular browser? I have not noticed any change.
The one thing I do notice is that on some very poorly built websites there will be a bug and it's because they haven't checked in Firefox or because I am blocking things that are no longer blockable on Chrome, but this is rare.
For me it's those horrible cloudflare and recaptcha things. I get them soooo much. And also that stupid cloudflare "We're checking if you're human" page.
I am on Linux though. Perhaps Firefox on Windows or Mac fares better. But these problems are from the last year or two and don't happen in chromium also on Linux.
Seriously. When I look at the modern state of front-end development, it's actually fucking bonkers to me. Stuff like Lighthouse has caused people to reach for optimizations that are completely absurd.
This might make an arbitrary number go up in test suites, at the cost of massively increasing build complexity and reducing ease of working on the project all for very minimal if any improvement for the hypothetical end user (who will be subject to much greater forces out of the developer's control like their network speed)
I see so much stuff like this, then regularly see websites that are riddled with what I would consider to be very basic user interface and state management errors. It's absolutely infuriating.
Yup. Give people a number or stat to obsess over and they'll obsess over it (while ignoring the more meaningful work like stability and fixing real, user-facing bugs).
Over-obsession with KPIs/arbitrary numbers is one of the side-effects of managerial culture that badly needs to die.
It’s just a few meaningful numbers like 0 accessibility errors, A+ for the securityheaders, flawless result on webkolls 5july net plus below 1 second loading time on pagespeed mobile. Once that has been achieved obsessing over stabilizing a flaky bloat pudding while patching over bugs aka features that annoy any user will have died.
I know... to be fair, I did test this for my use cases on older phones with throttled slower connections and it did improve the UX but I get what you're saying, I think it also depends on your target audience, who cares if your site is poorly graded by Lighthouse if your user base has high end devices in places with great internet? not even google cares since the Core Web Vitals show up in green
It's the biggest thing that stuck out to me using local AI with open source projects vs Claude's client. The model itself is good enough I think - Gemma 4 would be fine if it could be used with something as capable as Claude.
And that's gonna stay locked down unfortunately especially on mobile and cars - it needs access to APIs to do that stuff - and not just regular APIs that were built for traditional invoking.
The same way that websites are getting llm.txts I think APIs will also evolve.
reply