Agreed. Reviewing code takes so much longer and is far more exhausting than writing it, and you still don’t understand the logic as well or intuitively as you would if you write it.
Code reviews should be done by someone other than the author though, so the only thing that changes with ai generated code in that respect is the amount of it
Before: One person writes the code (and likely understands it thoroughly), another person reviews the code to spot obvious mistakes or shortcomings. Now: AI writes the code, a person reviews it to spot obvious mistakes or shortcomings.
In the before case, you have a person who has a deeper understanding of the code and in the AI case, you don’t, instead you have even more code to review.
When a competent programmer is writing the code, the human written code tends to be higher quality too. So it’s not just about review quantity but the quality of code being reviewed. Some people claim the AI writes great code, but that just hasn’t been my experience yet (at least with the models I’ve tried, including Opus). They still make ridiculously bad decisions regularly.
>When a competent programmer is writing the code, the human written code tends to be higher quality too
This is a great idea, but on average is deeply untrue. Far and away most programmers today write significantly worse code than LLMs. Also LLMs are fantastic at generating high level summaries and comments in code
> Far and away most programmers today write significantly worse code than LLMs
Your experience with LLMs do not match my own. Not to say that I haven’t experienced terrible human written code where I’ve wondered what the author could possibly have been thinking, but overall, I still find LLM written code to be on the poor side.
Like, the code itself is ok, but the wider picture reasoning and abstractions are bad. It also makes really dumb decisions far too often. Or doggedly shoehorns its first idea in no matter how badly it fits.
> eventually, you'll want to add a feature that clashes with that invariant
I find this to be a big problem with spec driven development: no spec survives the real world, some invariant that was in the spec will inevitably turn out to be wrong, no matter how much time you spend researching and designing the spec.
When I as a human hit this during development, I can take a step back and think it through, and decide oh yes, the invariant is wrong and needs to be thought through again, and the impact of changing it needs to be assessed. Then I can design around it. Sometimes that means a substantial change in design, sometimes not, but in all times the resulting software is better for it: an unknown has been uncovered, something new has been learned.
When this happens to AI, it keeps churning on it until it manages to hack a solution together, under the potentially wrong assumptions, design, or invariant. It doesn’t have the insight to step back and holistically reevaluate.
At least, that’s been my experience working with AI. I think we can improve its ability to handle these situations, through good workflows and verification, but it’s not something that comes natural to AI and not something Claude code or whatever support out of the box and it’s got its limits.
> which people find they need to do for LLMs to work at all anyway
Everything we have to do for AI to function well, would help humans to function better too.
If you take the things for AI, but do them for humans instead, that human will easily 2x or more, and someone will actually understand the code that gets written.
> If you take the things for AI, but do then for humans instead, that human will easily 2x or more, and someone will actually understand the code that gets written
This only works on high-trust teams and organizations. A lot of AI productivity gains are from SWE putting the extra effort because the results will be attributed to them. Being a force-multiplier for others isn't always recognized, instead, your perfomance will likely judged solely on the metrics directly attributed to you. I learned this lesson the hard way by being idealistic, and overestimating the level of trust that had been built after joining a new team. Companies pay lip service to software quality, no one gives a shit if your code has the lowest SEV rates.
Ah… that’s a reasonable point. Yes, the difference between a high-trust team and what you described is night and day. I suppose for those situations there’s a much bigger incentive to just throw AI at it, which explains why the big corporates love AI.
Lack of imagination doesn't mean this isn't innovation.
It's the ability to convey more information in less space.
Top-of-my-head notion: The cursor spins (or changes in another way) to reflect CPU use, or bandwidth use, instead of taking up space elsewhere on the screen.
The same was said about Compiz, but it turned out to be a passing gimmick that looked flashy but didn’t really add anything. Sure you could always make up reasons why it’s useful, I remember the same about Compiz, but… is it really? I could be proven wrong, of course, but it hasn’t been demonstrated yet.
It’s a solution in search of a problem. OP should have presented it with a real use case or benefit, not just flashy graphics, if it’s meant to be anything other than a fun oddity (which, to be fair, is perfectly fine).
Most people edit documents in Microsoft word, though, so it didn’t seem too far fetched that LLM content would be edited similarly, especially as more and more non-programmers use it.
I've been building a workflow engine for agent orchestration and the workflows are just data for the engine to execute. While I haven't experimented with it yet, I envision that an LLM would be rather good at generating the workflows based on a description of your needs (and context about how best to utilise the workflow engine).
LLM's are pretty good at reasoning about workflows, its just that when they have to apply them directly, the workflow context gets muddled with your actual tasks context. That's why using an orchestration agent that delegates work to worker agents works so much better.
I still think there's a huge amount of value in having the workflow executed in a deterministic way (as code, or by a workflow engine) because it saves tokens, eliminates any possibility of not following it, and unlocks other cool things, like being able to give each step in the workflow its own focused task-specific context, splitting plans into individual actions and feeding them through a workflow one by one, and having workflow-step specific verification.
But that workflow absolutely CAN be created by an LLM, it just shouldn't be executed by one.
Oh, yes! As someone who has dabbled in card tricks, this so much. People don't understand how its done and can't imagine or conceive of a way that it possibly could be done, so they attribute it to literal magic or demons or whatever. Like, no, I just distracted you for a split second and used sleight of hand.
Technology is no different: someone has never even considered that this thing could be possible, and now they see it with their own eyes? Incredible! They don't realise that its mundane and has been possible (in much cheaper ways) for a long time. It was like a few years ago when some journalist posted an animation showing how Horizon Zero Dawn does frustum culling and all the non-tech people were all "wow! This game unloads the game world when its not in view! Incredible!", like... yeah? That's how games have worked since the advent of 3D?
This is something I realised late last year while using Claude Code. The LLM shouldn't be the one in control of the workflow, because the LLM can make mistakes, skip steps, hallucinate steps, etc. Its also wasteful of tokens.
I'm a firm believer that a "thin harness" is the wrong approach for this reason and that workflows should be enforced in code. Doing that allows you to make sure that the workflow is always followed and reduces tokens since the LLM no longer has to consider the workflow or read the workflow instructions. But it also allows more interesting things: you can split plans into steps and feed them through a workflow one by one (so the model no longer needs to have as strong multi-step following); you can give each workflow stage its own context or prompts; you can add workflow-stage-specific verification.
Based on my experience with Claude Code and Kilo Code, I've been building a workflow engine for this exact purpose: it lets you define sequences, branches, and loops in a configuration file that it then steps through. I've opted to passing JSON data between stages and using the `jq` language for logic and data extraction. The engine itself is written in (hand coded; the recent Claude Code bugs taught me that the core has to be solid) Rust, while the actual LLM calls are done in a subprocess (currently I have my own Typescript+Vercel AI SDK based harness, but the plan is to support third party ones like claude code cli, codex cli, etc too in order to be able to use their subscriptions).
I'm not quite ready to share it just yet, but I thought it was interesting to mention since it aims to solve the exact problem that OP is talking about.
For simple workflows or once-off workflows, that's a good approach.
For long running repeatable workflows (eg you want to leave your agent running over night, you want to run the same workflows over and over in different projects, or more autonomous Devin-like workflows) or you want audit trails/observability, vetted workflows (ie not have the LLM write them; or have the LLM write them and you review them) without having to read through scripts, or you have more complex requirements like having different models/providers for different workflow stages or the things I mentioned previously (context, plans, verification, etc), or you have more complex workflow needs (swarms or fork/join, parallel pipelines, routing/branching, error recovery or routing, etc) then a robust dedicated workflow engine is needed in my personal opinion.
I think for most users using claude/codex for themselves on smallish projects, its unnecessary, but was you scale up, I feel that more powerful tools are needed. Also, for corporate, where you need repeatable workflows with audit trails, artefact management, and job queue based task management starts becoming more important too.
I also feel that using a workflow engine as an internal behind-the-scenes system in a GUI-centric vibe coding tool might also help raise the ceiling compared to the existing tools, but I've yet to test that hypothesis. Just because it takes the mistakes out of the users hands: the engine will follow proven workflows, whether you ask it to or not, keeping skills for context/knowledge, not for orchestration.
Something else I've been experimenting with a little, but not enough yet to have an opinion, is small language models running locally for orchestration, and frontier models for doing work.
People have so many verbal tics and filler words too. Anthropic’s Dario says “you know” after every third word, for example.
Or they meander around unrelated/unimportant details.
reply