More

CjHuber · 2026-05-17T12:29:43 1779020983

I‘m glad you didn’t write value

CjHuber · 2026-04-29T07:10:04 1777446604

Are you… explaining the effects of being acquired to Hashimoto?

nyeah · 2026-04-29T17:34:52 1777484092

But keep in mind the demographics of chat. 80% of the commenters are much smarter and more knowledgeable than the other 80%.

CjHuber · 2026-04-24T01:37:22 1776994642

It does have an built in documentation subagent it can invoke but that doesn’t help much if they don’t document their shenanigans

CjHuber · 2026-04-23T19:46:11 1776973571

I think it’s crazy that they do this, especially without any notice. I would not have renewed my subscription if I knew that they started doing this.

Especially in the analysis part of my work I don‘t care about the actual text output itself most of the time but try to make the model „understand“ the topic.

In the first phase the actual text output itself is worthless it just serves as an indicator that the context was processed correctly and the future actual analysis work can depend on it. And they‘re… just throwing most the relevant stuff out all out without any notice when I resume my session after a few days?

This is insane, Claude literally became useless to me and I didn’t even know it until now, wasting a lot of my time building up good session context.

There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them… make it an env variable (that is announced not a secretly introduced one to opt out of something new!) or at least write it in a change log if they really don’t want to allow people to use it like before, so there‘d be chance to cancel the subscription in time instead of wasting tons of time on work patterns that not longer work

munk-a · 2026-04-23T20:01:53 1776974513

Pointing at their terms of service will definitely be the instantly summoned defense (as would most modern companies) but the fact that SaaS can so suddenly shift the quality of product being delivered for their subscription without clear notification or explicitly re-enrollment is definitely a legal oversight right now and Italy actually did recently clamp down on Netflix doing this[1]. It's hard to define what user expectations of a continuous product are and how companies may have violated it - and for a long time social constructs kept this pretty in check. As obviously inactive and forgotten about subscriptions have become a more significant revenue source for services that agreement has been eroded, though, and the legal system has yet to catch up.

1. Specifically, this suite was about price increases without clear consideration for both parties - but the same justifications apply to service restrictions without corresponding price decreases.

https://fortune.com/2026/04/20/italian-court-netflix-refunds...

kiratp · 2026-04-24T01:42:14 1776994934

OpenAI does this for all API calls

> Our systems will smartly ignore any reasoning items that aren’t relevant to your functions, and only retain those in context that are relevant. You can pass reasoning items from previous responses either using the previous_response_id parameter, or by manually passing in all the output items from a past response into the input of a new one.

https://developers.openai.com/api/docs/guides/reasoning

Disclosure - work on AI@msft

jetbalsa · 2026-04-23T19:54:36 1776974076

So to defend a litte, its a Cache, it has to go somewhere, its a save state of the model's inner workings at the time of the last message. so if it expires, it has to process the whole thing again. most people don't understand that every message the ENTIRE history of the conversion is processed again and again without that cache. That conversion might of hit several gigs worth of model weights and are you expecting them to keep that around for /all/ of your conversions you have had with it in separate sessions?

3836293648 · 2026-04-23T20:00:22 1776974422

No? It's not because it's a cache, it's because they're scared of letting you see the thinking trace. If you got the trace you could just send it back in full when it got evicted from the cache. This is how open weight models work.

mpyne · 2026-04-23T22:12:58 1776982378

The trace goes back fine, that's not the issue.

The issue is that if they send the full trace back, it will have to be processed from the start if the cache expired, and doing that will cause a huge one-time hit against your token limit if the session has grown large.

So what Boris talked about is stripping things out of the trace that goes back to regenerate the session if the cache expires. Doing this would help avert burning up the token limit, but it is technically a different conversation, so if CC chooses poorly on stripping parts of the context then it would lead to Claude getting all scatter-brained.

charcircuit · 2026-04-24T06:55:08 1777013708

>and doing that will cause a huge one-time hit against your token limit if the session has grown large.

Anthropic already profited from generating those tokens. They can afford subsidize reloading context.

pixl97 · 2026-04-24T19:04:16 1777057456

No they can't, that's what you don't seem to get.

Reloading those tokens takes around the same effort as processing them in the first place.

It's ok to be ignorant of how the infrastructure for LLMs work, just don't be proud of it.

charcircuit · 2026-04-25T00:12:38 1777075958

They literally can. They could make the API free to use if they wanted. There is no law that states that costs have to equal the cost it takes to process the request.

eknkc · 2026-04-23T20:09:42 1776974982

I’m not familiar with the Claude API but OpenAI has an encrypted thking messages option. You get something that you can send back but it is encrypted. Not available on Anthropic?

reactordev · 2026-04-23T20:15:32 1776975332

They are sending it back to the cache, the part you are missing is they were charging you for it.

eknkc · 2026-04-23T20:24:06 1776975846

The blog post says they prune them now not to charge you. That’s the change they implemented.

reactordev · 2026-04-23T20:53:23 1776977603

right. they were charging you for it, now they aren't because they are just dropping your conversation history.

CjHuber · 2026-04-23T20:55:54 1776977754

No of course it’s unrealistic for them to hold the cache indefinitely and that’s not the point. You are keeping the session data yourself so you can continue even after cache expiry. The point I‘m making is that it made me very angry that without any announcement they changed behavior to strip the old thinking even when you have it in your session file. There is absolutely no reason to not ask the user about if they want this

And it’s part of a larger problem of unannounced changes it‘s just like when they introduced adaptive thinking to 4.6 a few weeks ago without notice.

Also they seem to be completely unaware that some users might only use Claude code because they are used to it not stripping thinking in contrast to codex.

Anyway I‘m happy that they saw it as a valid refund reason

rsfern · 2026-04-23T20:39:47 1776976787

It seems like an opportunity for a hierarchical cache. Instead of just nuking all context on eviction, couldn’t there be an L2 cache with a longer eviction time so task switching for an hour doesn’t require a full session replay?

sfink · 2026-04-24T16:34:55 1777048495

Living where? If it's in the GPU, then it's still taking up precious space that could be used for serving other sessions. If it's not in the GPU, then it doesn't help.

cyanydeez · 2026-04-23T21:51:32 1776981092

what matters isn't that it's a cache; what matter is it's cached _in the GPU/NPU_ memory and taking up space from another user's active session; to keep that cache in the GPU is a nonstarter for an oversold product. Even putting into cold storage means they still have to load it at the cost of the compute, generally speaking because it again, takes up space from an oversold product.

FireBeyond · 2026-04-24T00:47:14 1776991634

> There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them

The irony is that Claude Design does this. I did a big test building a design system, and when I came back to it, it had in the chat window "Do you need all this history for your next block of work? Save 120K tokens and start a new chat. Claude will still be able to use the design system." Or words to that effect.

CjHuber · 2026-04-24T01:23:27 1776993807

This is exactly what also confused me. I had the exact same prompt in Claude code as well, and the no option implies you can also keep the whole history. But clicking keep apparently only ever kept the user and assistant messages not the whole actual thinking parts of the conversation

CjHuber · 2026-04-07T12:09:00 1775563740

I‘d much rather have it go slooower and check more. Why faster it’s way too fast

CjHuber · 2026-04-07T04:25:24 1775535924

I honestly am very disappointed with this. I've only learned about CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING and showThinkingSummaries: true from this post. I've been wondering for a while where the summaries went and am always hoping like roulette that it thinks a lot. No wonder if there suddently is an "adaptive thinking" mode. I would have opted out 2 months ago if it was documented or communicated in any way publicly. Why change behavior without notice or any new user facing settings.

I just googled "CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING" and it seems like many people don't know about it.

And ULTRATHINK sets the effort to high, but then there is also /effort max?

triage8004 · 2026-04-07T07:04:23 1775545463

I'm now confused because I used to use ultrathink, went away as well as the chain of reasoning prompts, recently changed to high or extra thinking, now this is back?

CjHuber · 2026-02-25T07:25:39 1772004339

It's terrible that people think like that, especially in Georgia where they are still not tied to the debt fueled pyramid scheme that is the EU.

They still think of Europe as how it was 20 years+ ago, they always only look at the surface and never if the whole concept really works out long term.

victorbjorklund · 2026-02-25T07:43:30 1772005410

Russia is a tiny tiny economy built on corruption. Their whole economy is about selling energy. and right now their economy is failing. They're exporting less and less and less. They even have to import fuel because they can't produce enough for their own economy.

lukan · 2026-02-25T07:44:16 1772005456

"They still think of Europe as how it was 20 years+ ago, they always only look at the surface and never if the whole concept really works out long term."

Poland today seems in a way better spot than in was 20 years ago, so it seems it worked out for them. Likewise all the other eastern EU members where I travelled around. As soon as I left EU territory, things looked way worse.

misja111 · 2026-02-25T07:50:24 1772005824

And the alternative is ... Russia? A corrupt dictatorship whose economy is kept alive with government war spending?

CjHuber · 2026-02-21T02:38:38 1771641518

I honestly still don't see the point of compaction. I mean it would be great if it did work, but I do my best do minimize any potential for hallucination and a lossy summary is the most counterproductive thing for that.

If you have it write down every important information and finding along a plan that it keeps updated, why would you even want compaction and not just start a blank sessions by reading that md?

I'm kind of suprised that anyone even thinks that compaction is currently in any way useful at all. I'm working on something which tries to achieve lossless compaction but that is incredibly expensive and the process needs around 5 to 10 times as many tokens to compact as the conversation it is compacting.

martinald · 2026-02-21T02:57:44 1771642664

Well a few things.

Firstly, it's very useful to have your (or at least some) previous messages in. There's often a lot of nuance it can pick up. This is probably the main benefit - there's often tiny tidbits in your prompts that don't get written to plans.

Secondly, it can keep eg long running background bash commands "going" and know what they are. This is very useful when diagnosing problems with a lot of tedious log prepping/debugging (no real reason these couldn't be moved to a new session tho).

I think with better models they are much better at joining the dots after compactation. I'd agree with you a few months ago that compactation is nearly always useless but lately I've actually found it pretty good (I'm sure harness changes have helped as well).

Obviously if you have a total fresh task to do then start a new session. But I do find it helpful to use on a task that is just about finished but ran out of space, OR it's preferable to a new task if you've got some hellish bug to find and it requires a bunch of detective work.

CjHuber · 2026-02-21T03:06:19 1771643179

I mean I agree the last couple of messages in a rolling window are good to include, but that is not really most of what happens in compaction, right?

> there's often tiny tidbits in your prompts that don't get written to plans.

Then the prompt of what should be written down is not good enough, I don't see any way how those tidbits would survive any compaction attempts if the llm won't even write them down when prompted.

>Secondly, it can keep eg long running background bash commands "going" and know what they are. This is very useful when diagnosing problems with a lot of tedious log prepping/debugging (no real reason these couldn't be moved to a new session tho).

I cannot really say anything about that, because I never had the issue of having to debug background commands that exhaust the context window when started in a fresh one.

I agree they are better now, probably because they have been trained on continuing after compaction, but still I wonder if I'm the only one who does not like compaction at all. Its just so much easier for an LLM to hallucinate stuff when it does have some lossy information instead of no information at all

martinald · 2026-02-21T16:21:01 1771690861

AFIAK claude code includes _all_ messages you sent to the LLM in compactation (or it used to). So it should catch those bits of nuance. There is so much nuance in language that it picks up on that is lost when writing it to a plan.

Anyway, that's just my experience.

CjHuber · 2026-02-22T05:50:37 1771739437

I think your point doesn't hold up really. Telling an LLM to summarize something losslessly will loose so much more nuance than updating the plan directly every time when some useful information is gained.

That file is not even a plan but effectively a compaction as well, just better as its done on the fly only processing the last message(s) rather than expecting an LLM to catch all nuances at once over a 100-200k+ conversation.

peacebeard · 2026-02-21T03:07:53 1771643273

Works fine for me in sessions that use a lot of context. My workflow is to keep an eye on the % that shows how soon it will auto compact. And either /clear and start over, or manually compact at a convenient place where I know it'll be effective.

grimgrin · 2026-02-21T03:47:12 1771645632

i use https://github.com/sirmalloc/ccstatusline and when im around 100k tokens im already thinking about summarizing where we're at in the work so i can start fresh with it

it is pretty rare for me to compact, even if i let it run to 160k

--

just realized how i wouldn't think about using ccstatusline based a quick glance at its README's images. looks like this for me:

https://i.imgur.com/wykNldY.png

Aditya_Garg · 2026-02-21T06:34:33 1771655673

You just described the ralph loop, and its incredibly effective. Compaction is on the way out

rstuart4133 · 2026-02-22T12:08:28 1771762108

> I honestly still don't see the point of compaction.

Currently my mental model is every token Claude generates gets added to the context window. When it fills up there is no way forward. If you are going to get a meaningful amount of work done before the next compaction they have to delete most of the tokens in the context window. I agree after compaction it's like dealing with something that's developed a bad case of dementia, but you've run out what is the alternative?

> why would you even want compaction and not just start a blank sessions by reading that md?

If you look at "how to use Claude" instructions (even those from Anthropic), that's pretty much what they do. Subagents for example are Claude instances that start set of instructions and a clean context window to play with. The "art of using Claude" seems to be the "art of dividing a project into tasks, so every task gets done without it overflowing the context window".

This gives me an almost overwhelming sense of déjà vu. I've spent my entire life writing my code with some restriction in mind - like registers, RAM, lines of code in a function, size of PR's, functions in a API. Now the restriction is the size of the bloody context window.

> I'm working on something which tries to achieve lossless compaction but that is incredibly expensive and the process needs around 5 to 10 times as many tokens to compact as the conversation it is compacting.

I took a slightly different approach. I wanted a feel for what the limit was.

I was using Claude to do a clean room implementation of existing code. This entails asking Claude to read an existing code base, and produce a detailed specification of all of its externally observable behaviours. Then using that specification only (ie, without reference to the existing program, or a global CLAUDE.md, or any other prompts), it had to reliably produce a working version of the original in another language. Thus the specification had to include all the steps that are needed to do that - like unit tests, integration tests, coding standards instructions on running the compiler, and so on, that might normally come from elsewhere.

Before proceeding, I wanted to ensure Claude could actually do the task without overflowing its context window - so I asked Claude for some conservative limits. The answer was: a 10,000 word specification that generated 10,000 lines of code would be a comfortable fit. My task happened to fit, but it's tiny really.

When working with even a moderate code base, where you have CLAUDE.md, and a global CLAUDE.md for coding standards and what not and are using multiple modules in that code base so it has to read many lines of code, you run into that 10,000 words of prompt, 10,000 lines of code it has to read or write very quickly - within a couple of hours for me. And then the battle starts to split up the tasks, create sub-agents, yada-yada. In the end, they are all hacks for working around the limited size of the context window - because, as you say, compaction is about as successful for managing the context window as the OOM killer is for managing RAM.

CjHuber · 2026-02-12T00:03:36 1770854616

And how to get to the old verbose mode then...?

bcherny · 2026-02-12T00:33:47 1770856427

Hit ctrl+o

bostonvaulter2 · 2026-02-12T00:59:08 1770857948

Wait so when the UI for Claude Code says “ctrl + o for verbose output” that isn’t verbose mode?

bcherny · 2026-02-12T02:47:04 1770864424

That is more verbose — under the hood, it’s now an enum (think: debug, warn, error logging)

kstenerud · 2026-02-12T06:13:35 1770876815

Considering the ragefusion you're getting over the naming, maybe calling it something like --talkative would be less controversial? ;-)

x4132 · 2026-02-12T18:46:39 1770921999

ctrl + o isn't live - that's not what users want, what users want is the OPTION to choose what we want to see.

CjHuber · 2026-02-03T07:23:22 1770103402

Does it not use prompt caching?