More

binyu · 2026-06-01T14:32:21 1780324341

> Wait, wait, wait: browsers allow websites to store junk on my drive?

Technically even a cookie is junk on your drive

> Without even asking whether the site can use local storage?

Would it be practical to ask permission for every site you visit? It would be better to periodically check the size of your home folder (where the browser profiles normally reside)

binyu · 2026-05-31T20:46:25 1780260385

The V100 and the 4090 are based on vastly different architectures, the former uses the older Volta while the latter uses Ada. Last I checked you cannot meaningfully combine them. The 3090 is better than the V100, just get two 3090 and a NVLink.

tymscar · 2026-06-01T00:12:27 1780272747

Well I did in fact meaningfully combined them without an issue, that was the whole point of the blogpost.

cthalupa · 2026-06-01T00:19:05 1780273145

You can split tensors across an AMD GPU and Nvidia GPU - different architectures are not an issue. People run LLMs across some pretty crazy setups.

binyu · 2026-05-30T05:29:49 1780118989

MCP is what XML dreamed of becoming.

binyu · 2026-05-20T01:31:44 1779240704

Truly makes me positive about the future. Thanks Andrej

binyu · 2026-05-12T14:28:52 1778596132

> It makes individual agents untestable because their inputs and outputs are strings

Strings can't be valid test vectors? Large language model are highly non-deterministic by design, no matter what.

binyu · 2026-05-11T02:21:16 1778466076

They all still fall short of Opus 4.6, definitely though. They are good but fail on extremely complex tasks, in contrast with a frontier model that will keep on trying until it succeeds or exhausts the solutions space.

julianlam · 2026-05-11T03:48:50 1778471330

Not by much, and moving goalposts makes for a bad comparison. Local open weight models are already more powerful than frontier models from only a year back.

If you believe what you read here, the gap is closing fast.

segmondy · 2026-05-11T20:22:29 1778530949

frontier models don't keep trying until they succeed. that's a harness problem and best believe it, the best harness are private and not public.

binyu · 2026-05-11T21:57:39 1778536659

It is much more of a context window size and model capabilities problem. Local models are not even remotely close in solving complex problems, even when used with the same harness.

binyu · 2026-05-11T02:18:35 1778465915

DeepSeek V4 with 1 million token context window is pretty powerful, although still not there. There's hope that Opus 4.5 level performance locally is not that far away.

Aurornis · 2026-05-11T02:36:53 1778467013

Running DeepSeek V4 without extreme quantization locally requires a lot of hardware.

The IQ2 quants that fit into 128GB machines are very degraded.

binyu · 2026-05-11T02:40:45 1778467245

That is true, it is a 1.6T parameters model so it requires a great deal of memory. I also heard there's a 2bit quantization that works well on Apple metal.

tuananh · 2026-05-11T02:34:53 1778466893

From what I read, ds v4 is very close with opus 4.6 performance.

DeathArrow · 2026-05-11T05:40:28 1778478028

The full model is, not the quantized versions.

tuananh · 2026-05-11T06:30:37 1778481037

yeah that goes without saying. how can openweight, quantized version beat SOTA :)

array_key_first · 2026-05-11T16:38:19 1778517499

Well it depends on the task. For agentic coding, more is more, but for tasks that normal consumers use them for there really is a ceiling. OCR, text to speech, that type of thing doesn't really improve when going to a SOTA model, so you'd just be wasting your money. I think local LLMs have more value than software engineers give them credit for.

tuananh · 2026-05-12T03:03:04 1778554984

totally agree with that. local llm doesn't need to match SOTA performance in order to be useful.

binyu · 2026-05-11T01:52:52 1778464372

> I'm rewriting k10s in Rust. Not because Rust is better but, because it's the language I can steer. I've written enough of it to feel when something's wrong before I can articulate why. That instinct is the one thing vibe-coding can't replace. The AI hands you plausible-looking code. You need a nose for when it's garbage.

Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically.

> The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules. The architecture decisions that the AI kept making wrong are now made in writing before the first prompt.

This post is good to grasp the difference between "vibe-coding" and using the AI to help with design and architectural choices done by a competent programmer (I am not saying you are not one). Lately I feel that Opus 4.7 involves the user a lot more, even when given a prompt to one-shot a particular piece of software.

dropbox_miner · 2026-05-11T02:03:13 1778464993

Go reads fine whether the architecture is good or bad, and I couldn't tell the difference until I was in trouble. Rust is harder to read but harder to misuse. The borrow checker would have caught that data race at compile time. I've also just written more Rust. That familiarity matters separately.

+1 on Open 4.7 involving the user a lot more. Rn I'm trying to get to a state where I can codify my design + decision preferences as agents personas and push myself out of the dev loop.

binyu · 2026-05-11T02:07:32 1778465252

Gotcha, that implies you are going to read the code that the AI produces anyways.

> Go reads fine whether the architecture is good or bad

Were you reading the Golang code all along and got fooled or did you review it after it failed? Sorry I admit I didn't read the whole article.

williamstein · 2026-05-11T02:24:17 1778466257

He was NOT reading the code: "For 7 months I'd been prompting and shipping without ever sitting down and actually reading the code Claude wrote."

binyu · 2026-05-11T02:28:07 1778466487

Right, thank you. Personally I think reading all the code that the AI produces is impossible and kind of defeats the purpose of using it. The key is to devise a structured way to interact with it (skills and similar) and use extensive testing along the way to verify the work at all steps.

ok_dad · 2026-05-11T06:32:55 1778481175

Buddy that k10s code was never good. Go vs Rust is not the issue here, it’s the fact the project was vibe coded without reading anything. It’s hilarious to even think that a god model was caused by anything other than someone who let the bot choose too much.

Good architecture in any language is obvious to someone who is experienced and cares.

Go is actually great for bots to write if you’re actually thinking.

cortesoft · 2026-05-11T03:54:35 1778471675

> Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically

It sounds like the author knows Rust, and might not be as familiar with Go.

A language that you are proficient in is always going to be easier read than one you don’t, even if it is an objectively easier language to to read in general.

travisgriggs · 2026-05-11T05:44:52 1778478292

In a world where juniors (or seniors in new territories) are incentivized to publish or perish, how will any of us gain proficiency any more? I can see the agent assisted journey accelerating some familiarity, but not proficiency.

I’ve used AI tools to do i18n translations to Spanish and Portuguese (somewhat ashamed to admit this). I’ve grown more familiar with the structure of these languages, and come to recognize some of the common vocabulary for our agtech domain. If anything, I feel more clueless about both languages now than I did before, when it comes to any sort of proficiency.

binyu · 2026-05-08T23:00:12 1778281212

Yeah, that part is probably not done by Claude.

binyu · 2026-05-08T19:25:41 1778268341

They forgot to add timing attack on images load time which can be used to tell if you visited X website.

https://www.ieee-security.org/TC/SP2011/PAPERS/2011/paper010...

lights0123 · 2026-05-08T19:52:12 1778269932

Not since browsers started partitioning caches in 2020: https://developer.chrome.com/blog/http-cache-partitioning/

binyu · 2026-05-08T21:54:21 1778277261

I don't think this protects from sidechannel/timing attacks applied to images load time completely.

Edit: Reading more thoroughly, probably it does to a great extent after all.