Hacker Newsnew | past | comments | ask | show | jobs | submit | CharlieDigital's commentslogin

MCP is more than is more than tools. Tools is one of three major features: prompts[0] and resources[1] being the other two.

Prompts are effectively "server delivered skills" which are are quite powerful because it solves a distribution and synchronization problem. It also allows server materialization and dynamic construction of skills.

MCP also has a few other under utilized mechanisms: elicitation[2] on the client side and completions on the server side[3]. It is an API of sorts, but specialized for agent harness <-> server interactions.

[0] https://modelcontextprotocol.info/docs/concepts/prompts/

[1] https://modelcontextprotocol.info/docs/concepts/resources/

[2] https://modelcontextprotocol.io/specification/2025-11-25/cli...

[3] https://modelcontextprotocol.io/specification/2025-11-25/ser...


this is bad. Anyone doing any cursory work with agents will realize how brittle <<just managing your own prompts>> can be. Adding an extra layer of indirection isn’t helpful, it’s a gigantic hindrance that gives you a moving eval target. Being an MCP developer means you have a moving target of model optimization. It is a win for nobody.

The tools we need to solve this problem exist and they are boring. Types, jsonschema, openapi, all of it is a better integration point than MCP.


It keeps people employed, yes?

And with people I guess I might actually mean not people but tokens everybody has to spend on keeping their environment self-adapting...


That's because you're not thinking about how teams and enterprises work. You're thinking about how individuals work.

An enterprise has 20 services that each have a secret key (Datadog, Snowflake, etc). I want my team to have access to those services via coding agents. How do I guard those keys from both developer and agent? Put it behind MCP; neither dev nor agent ever sees the key. If developer leaves, revoke one OAuth cred.

I want to add access to internal and external services from one entry point without developers across hundred of teams having to sync or update their workspace. Put it behind one MCP interface.

I have enterprise skills and resources that I want to standardize and deliver to every team. But it has to vary in 10-15% of the skill body. Think same heuristics, but different specifics. MCP delivered prompts and resources can do that by dynamically templating them.

I want telemetry and data on how skills and tools are being used and I want to capture them using standard tooling like OTEL regardless of agent harness because I don't want to have to rebuild a solution on hooks if I charge vendors. MCP does that because I can capture all of the telemetry there.

    > jsonschema, openapi, all of it is a better integration point than MCP.
MCP is schema + interaction model. If MCP were built on OpenAPI, it would still need another layer to describe interaction. It is effectively JSON schema + interaction flow + standard surface area.

Your argument feels like asking why do we need OAuth and OIDC when we already have usernames and passwords. They solve different problems. A simple service can just use a secret key or username + password. But more complex enterprise scenarios need the structure and flow of OAuth, SAML, and SCIM.


You’re not talking about how teams and enterprises work, you’re talking about how teams and enterprises don’t work.

Teams and enterprises had problems maintaining API keys long before there was MCP and they will have the same problems afterwards. The better teams and enterprises have had solutions for a long time.


I wish you would explain more of how you infered the handlers' KPIs here

From my point of view their purpose in life is 1) hacker news highlights or 2) to restrain some patients (me) from getting off the (Freudian) couch and mouthing off at "the folks in the waiting room"


can these not be surfaced in an api and accessed using curl, with instructions in a SKILLS.md?

Sure. It would be great if they were portable as well.

To make them interoperable so that the APIs have similar surface areas and can just be used without special skills, we could even come up with a standard API surface area and create a...protocol.

If you squint, the SKILL.md and the context that it takes up is literally the same thing as the MCP server and tool description. They are literally the same thing except one is server delivered and one is not.

MCP is "Let's use Google Sheets and have a server-managed experience". Everyone sees the same thing on the server in real time.

Skills is "Let me download the Excel and send it back to you". Why? How is this better? Every time I update the Excel, I have to add a `.2026.final.final2.xlsx` and everyone updates their copy...how is this the superior experience?


It would be really, really great if Codex could support MCP Prompts[0]

This would allow us to deliver standard prompts across the team without having to sync manually or with scripts; keep everyone up to date. Even allow per-user customization of "skills" via server rendering of the prompts.

AFAIK, Codex is the only major harness to not support this.

[0] https://github.com/openai/codex/issues/5059#issuecomment-453...



Yes, exactly that one! thanks

https://github.com/dotnet/Open-XML-SDK

First party from Microsoft; feels like it would be the way to go.


That is the source of DocumentFormat.OpenXml, you're talking about the same package, https://github.com/dotnet/Open-XML-SDK#packages

Microsoft are incapable of:

    1. naming things well
    2. keeping those names stable

It's a shame because to guide a coding agent, you need to have the right grammar and vocabulary to describe what you want and how you want it to be built. Junior devs should read not because they need to know how to write the code, but they need to know the vocabulary and the grammar to guide the agents.

At work we had a dispute over if AI should be allowed in the technical interview, we resolved it by both running an AI allowed and not allowed interview. Something interesting we found is that every candidate either passed or failed both. People who could not program manually without AI were not able to get the agent to complete the tasks either.

I've seen people type questions in to the LLM and get the answer they asked for but not the one they needed/wanted because they didn't have the correct terminology.


Junior devs should still read to learn how to write the code.

Surely the desired state isn't that nobody knows how to write code any more right?


    > Surely the desired state isn't that nobody knows how to write code any more right?
Shaping up like that in my org. At least one mid-career dev says he no longer looks at code.

I still look at code and find that agents work best when I write the foundation and then vibe on top of my hand-written code. Works extremely well because agent picks up my style accurately.


Hopefully your management is trying to answer the following question: is said middle-career dev outproducing their past self, and others who still look at code, with:

1) submitted changes that don't need any more revision than their previous human-written ones when it comes to code review?

2) no increase in bug incidents

3) no slow-down in peer work or future work caused by humans-or-agents having to fight increasingly overly-complicated, poorly-factored, copypasta-style code or god methods? (this might not be evident yet)

(Another question is how well is this person doing their job as a reviewer, making sure to keep the product quality bar high, without looking at code?)

Anyone in an org with coworkers no longer writing code needs to be making sure their managers have a pulse on the long-term health of the product to see who's doing it well (lots of test coverage, shipping only super-high-quality, refined-from-multiple angles stuff) or just being lazy (shipping first drafts that continually add debt to various files and methods).


Do you know how to operate a punch card?

Yes, and IBM has current documentation if you need to that has been updated in 2026: https://www.ibm.com/docs/en/zos/3.2.0?topic=considerations-u...

It's generally and simply an encoding of what amounts to binary machine code which you translate via assembly code acting as a deterministic compiler from assembly to machine code if you are doing it manually.

LLMs aren't a deterministic process and human languages aren't as clear as machine code and assembly.


O.M.G.

I last used a card punch in circa 1980 or thereabouts...


> Do you know how to operate a punch card?

I remember! You created a control card, with tab stops and other controls, wrapped it around a control drum, and then had an easy time punching your source FORTRAN!

I just looked and found my old control drum, in the back of my junk drawer. But I can't find an old punch card machine in there, most have lost it somehow.


Yes. But Python isn't punch cards behind the scenes so it's not the same thing at all.

Besides. You're not asking <AGENT OF THE WEEK> to produce punch cards to jam into the PDP.


someone needs to try this

If I transported you to the 1960s and gave you a wizard that could punch cards for you with a chance of making a mistake, would you still bother to learn how to operate a punch card?

What would you do if the wizard gets stuck? Coarse the wizard into making the black box work through somebody else's direct perspective on the problem?


I don't think this is comparable.

It's more like a restaurant. You give an order and a little while later, a finished dish appears.

The difference between a Chipotle and a Michelin starred establishment is that Chipotle is just assembling a mass produced good. A Michelin chef knows their ingredients inside and out; knows the science of how those ingredients work; knows varied techniques to extract flavors, create textures, etc.

Anyone can work in a Chipotle; few can achieve a Michelin star.


I've never programmed before good compilers existed, but I still know some assembly. For what I currently do it's used rarely, but it's still quite valuable on occasion. I don't see any reason LLM-assisted programming wouldn't be like that; for sure the various C compilers sure seem like they're trying just as hard to produce results you don't want.

Do you let your Jenkins re-inference your entire program from markdown files on each push?

Do you maintain a system in which punch cards play a critical role?

I was wondering about this myself, but given everything I know about AI. Won't the vocabulary slowly and subtly change as common people try to develop software, not knowing the jargon? Won't the AI systems learn from the prompts and adjust their understanding of what's trying to be accomplished?

It will regress to the mean.

The default output of any agent is going to be Transaction Script, for example. It will never on its own accord start writing Domain Model.

As it produces more Transaction Script and sesles its own Transaction Script, it will regress to this paradigm.

You can get it to write Domain Model, but that is not its default.

I liken it to examples in docs: they are always intentionally the simplest case.


And to operate a self-driving car safely you need to keep your attention on the road so you can take over quickly when needed.

But that's not how human nature works. Most people take the path of least resistance. Especially when the primary purpose of the invention is to offer convenience.


C# massive standard library and first party libraries means much, much fewer external dependencies and these libraries are managed by a team of paid, professional engineers.

Highly, highly underrated.


I'm going to offer a contrarian view here:

First is that despite a lot of waste, some innovation will arise from an enterprising employee finding some interesting use case. A lot of the tokenmaxxing is just waste, but out of that waste may arise a small number of genuinely powerful use cases.

Second is that many workers will be entrenched in their ways. If your executive goal is to achieve the above (find innovative ways of using AI), then you need to move everyone to use it. Most will just waste tokens, but someone may find a novel and useful way of using it that benefits the organization. It is difficult to achieve these without forcing people to act since their default is to follow the well-worn grooves.

So mandates like these are a top-down forcing function like a slime mold feeling out different paths to find resources.

Some devs in my org have fully embraced AI; some would not even use AI if not for leadership mandates and linking usage to performance reviews (I know, I think this is stupid, too). I can see why mandates could be useful since some folks definitely won't be inclined to use AI.


> but out of that waste may arise a small number of genuinely powerful use cases.

Imagine you employ me as a hotel manager, and I come to you and say: "sure I spent all our food budget internationally in three months, and sure I have nothing really to show for it, but for those three months, we had a lot of food fights"

Your manager then goes on to explain they not only need more money to cover the food budget, but also they need to quituple the cleaning budget too.

Oh and the service level has dropped, because not all clients liked being in the middle of a food fight.

However "we might have some innovation in the food delivery system of our hotel chain"


    > we might have some innovation in the food delivery system of our hotel chain
This is really relative to the size of that innovation, isn't it?

    > Imagine you employ me as a hotel manager, and I come to you and say: "sure I spent all our food budget internationally in three months, and sure I have nothing really to show for it, but for those three months, we had a lot of food fights"
This is exactly how startups and VC funding works, isn't it? You have an idea, give you cash to burn to prove the idea and business model. Many teams and ideas fail. But some small number of unicorns produce outsized returns to keep the whole thing going.

It's how it does work, often.

It's not how it should work, because food fights are stupid and have no upside.

Even if everyone else is having them.

It's not a fair analogy because AI isn't completely stupid, and there are situations where it does provide a benefit.

But a rational business would ask if the upside is worth the cost, if the pipeline can be restructured to concentrate and amplify the benefits, if some elements are better being done the old way, if there are strategic threats if tokens become much more expensive, and so on.

Instead we're getting a wave of "Cut workers, cut costs, derp" and that's as far as the "thinking" goes.

The worst thing about AI is that it shows how shallow and stupid current C-suites are.

The US used to have real tech visionaries. Now it has tech cargo cultists, all chasing an IPO cash out and hoping the music doesn't stop before they get their bag.


Imagine you employ me as a hotel manager, and I come to you and say: "sure I spent all our food budget internationally in three months, but we invented this new dish and now our restaurant is the hottest in town. Sure 95% of the food was wasted but now we can stop the waste and keep the popular dish."

Ok, but was that your intention in the first place? or was it to have food fights.

Thats the problem here. The idea is that we can build more stuff, quickly.

However in uber's case, they just burnt loads of money to push a metric that wasn't really related to productivity.


The intention was to force everyone to experiment with the new ingredient monsanto recently GMO'd. Of course a lot of our employees suck, so food fights were expected, but luckily some of the employees created something great.

> some innovation will arise

Absolutely, but most management are not leaders, the moment someone pushes the idea to stack rank based on token usage, it gets approved and some genuine people will be impacted.

Post-ZIRP era proved there are very few strong leaders, before that everyone was behaving like they're most amazing leader because they read some books and raised $10M


Sure, indiscriminate tokenmaxxing is a gamble that can pay off sometimes. However, I think that the decision to take any gamble should be made by someone who will bear responsibility for the downside as well as the upside. I would prefer to search for new usages in a more strategic way. I agree that experimentation is a great way to learn if done intelligently and with limits. Full “Monte Carlo” makes sense when ops are cheap enough. It seems some orgs don’t think tokens are cheap enough yet.

    >  I would prefer to search for new usages in a more strategic way
I think this is very, very hard for orgs to do.

Looking back at the Internet, who would have thought that it would eventually create a Netflix, Amazon, Shopify, Spotify, Google Maps, etc. Just wild the things that ended up coming out of pushing bits over a wire with few simple protocols.

In an ideal world, you make strategic bets, but I can also see the case for the opposite this early in the lifecycle of a technology. You just don't know until you try.

Mid/late 2023, it wasn't at all obvious that it would take over coding that fast.


People talked about streaming years before Netflix. Online maps apps date back to the 1990s. E-commerce as well.

I definitely get the impression that many people thought it would eventually create shopping, streaming, and mapping sites.

I think people were less likely to have predicted things like social media or YouTube, though those weren't ideas sprung from a vacuum either.


If it were that simple and obvious, Blockbuster would have beat everyone to streaming. Sears would have digitized their catalog and used their vast brick-and-mortar stores as fulfillment centers for same-day shipping.

None of these shifts were obviously the right bet and many organizations lost because they missed the opportunity. Now orgs are on the same horizon and I can see why they don't want to miss this window.


Blockbuster actually did try to beat everyone to streaming. Notably, Blockbuster and Enron [1] entered into a 20-year partnership for online video delivery.

Sears was a different story, in that they were a real estate company with a store front and retail real estate took a nosedive due to ecommerce. But that's a different discussion.

[1] https://en.wikipedia.org/wiki/Enron_scandal


> A lot of the tokenmaxxing is just waste, but out of that waste may arise a small number of genuinely powerful use cases

A lot of monkeys will also eventually type up Shakespeare?


Indeed, but that's not a bad thing. If monkeys can produce the next Shakespeare, that will be wildly popular and profitable for the company that did it, justifying the initial waste, just as VC does with companies as a whole.

> Some devs in my org have fully embraced AI; some would not even use AI

So if the people who embrace AI areore successful then the others will follow. Just like every other new tech. Why does AI have to be forced? What's the hurry? Especially when there's no clear example of a company jumping ahead because of their use of it.

It's idiots being driven by FUD. That's the reason.


    > What's the hurry?
There are definitely key windows here for innovation driven by competition.

There's also a need to quickly adopt and understand the technology; take the Internet for example. If we were talking about the Internet, forcing teams to build and publish web pages would be one valid way to get teams comfortable with the tech, the workflow, the shift in how to propagate and convey information to an audience.

Without a mandate, many teams won't adopt the Internet as a medium of information exchange because their processes work just fine and have worked for the last 20 years.

I think it's fair to put AI in a similar light. Unless teams adopt it and use it, it's hard for an org to understand how to get value out of this technology and how it affects existing processes and assumptions.


> There are definitely key windows here for innovation driven by competition.

Those were always there, and will always be there. The type of time frames people are getting anxious about now rarely work in the real world, though, where potential customers don’t just switch products/service provides unless they’re facing catastrophic outcomes if they don’t.

And AI is not making the difference there that people think. I worked on a product that entered the market as a newcomer, wooed plenty of customers, and even though everyone _wanted_ it, only customers _urgently_ looking for a solution got on board quick (within <6 months).

Ironically enough, the product pivoting to Agentic AI hard killed a ton of momentum and interest from customers, despite exciting investors.


I was programming desktop applications when the web came along. I don't remember anyone ever saying they had been mandated to program for it.

The web took off all by itself because it had a clear value proposition for some use cases.


    > The web took off all by itself because it had a clear value proposition for some use cases.
Many enterprises became legacy because of the web, many enterprises failed because they didn't understand the impact of the tech.

Sears was the OG Amazon. Imagine if Sears had seen it as the new digital catalog.

Blockbuster missed on streaming until it was too late.

Many, many legacy companies did not understand the web and did not understand the impact of the Internet to their business model.


I genuinely think you don't actually know the history and timeline of companies you talk about. Mostly because it did not happened how you imagine, in the quick timeline you imagine nor even for reasons you imagine.

And even more importantly, the companies who went all in early and spent too much money on it too early without good reason went boom. You had to have actual business reason for it to be success.


And you think forcing blockbuster's software teams to use the Web would have changed that? You don't think they were using the web for all their corporate communications systems? I very much think they were, and getting blindsided by streaming had probably nothing to do with blockbuster's existing engineering teams not understanding the Internet. Their product teams didn't understand it, but they wouldn't be the ones being "forced to write webpages" either

    > And you think forcing blockbuster's software teams to use the Web would have changed that?
Yes; non-zero chance that had they been more aggressive in pushing the web, someone would have landed on the right answer.

Seriously. No mandates at my company. In 2023 and 2024 i had access to Claude, but frankly it wasn't until 2025 that i found the models useful enough, now i use them every day. Nobody forced me. Had they forced me, I'd probably have quit. Once the tools were sufficiently mature and verifiably helpful, people like me all over the company naturally picked up the tools too.

It is not worth switching to Pi except as a hobbyist.

Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).

Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.

In this era of software when you can build almost anything you can imagine, why spend that time building plugins for a harness?


Hard disagree.

Pi has optimizations as well, and development is quite active.

We are literally months into this new frontier. Mainstream harnesses are not far off from a minimal + extensible open alternative.

You don’t have to build your own plugins, as you can simply install an existing plugin that does what the mainstream harnesses do. Folks are already making the same functionality, but with more control to the user.

If you are a builder, like many reading this thread, pi is the way to go. Pi already gives you the tools to leverage LLMs to assist with building plugins, if that’s the way you want to go.


That's like arguing that you should spend your time tuning your IDE. How does that relate to end-user value created?

Yes, you built yourself a nice little utility.

Meanwhile, you wasted those tokens and time that could have been spent building actual, useful software instead of hobby tinkering your harness.

It's like thinking your sneaker tread design is going to make the difference between you and someone who just goes out there and runs everyday. The person that just runs is going to win the race every time while you 3D print the perfect tread design optimized for you running style...and don't actually run.

If you want to produce better results at running, you just run and optimize the externalities (gear) later. Same here: you have a magical software production factory and the only thing you want to use it for is your hobby tweaking of your perfect harness instead of...just making useful software.

:clap: :clap: I guess.


Why would taking the more open, minimalist, configurable and ultimately diligent route means you won't be working on anything else?? Not to mention that pi has other advantages over Claude and Codex, read up on it. Also, improvements to the agent itself will pay more dividends the earlier they are applied. The tone of this message is waaaay off.

    > Why would taking the more open, minimalist, configurable and ultimately diligent route means you won't be working on anything else??
You're using the same finite pool of time and tokens. Why waste your time with the perfect gear instead of focusing on just getting really good at running? Just go run and when you've pushed the limits and the gear becomes the difference, then optimize the gear to get to the next level.

While you're busy trying to optimize your harness, others are just building and shipping with the magical software factory.


What are these "others" shipping, slopware? Agents are not a "magical software factory", they are a tool with a lot of limitations, but which can speed up development in a sustainable way, when used wisely. And that includes configuring it in a way that complements the other tools in our toolkit.

Everyone's waking up to this simple truth: vibe coding like there's not tomorrow accumulates conceptual and technical debt at a unsustainable rate. Then when the "magical factory" gets mired in its own mess, it's back to the drawing board. This is the also what the makers of pi have discovered, if you listen to their talks about how pi came about. I don't believe there are any justification for the assumptions you make about their approach, nor am I seeing you presenting any either. As it is, you take just feels peevish and unfair, to be honest.


A story to share: friend vibe coded absolute slop with Replit starting late 2024 (!!). Absolute trash code. Hacked multiple times because his login code exposed the full user list on the FE (!!!). Hacker found a way to exploit his account confirmation email because it was all front-end and sent an email to every customer telling them he was hacked. One time called me up in a panic asking why his web page was randomly refreshing (turns out, he was serving it in dev mode via Vite with HMR). It was mistake after mistake after mistake.

But he started to get customers. First a handful, then a dozen, then enough to get legal threats from other vendors, and this year, his first "enterprise" deal providing software in a space that was long dominated by a duopoly of legacy providers.

Guess what he did? Just rewrote it with the latest models and hired one engineer to ensure agents followed better practices. It's a legit business now built by a tiny team using a magical software factory to produce absolute trash code, but in shipping it, he found a market and customers willing to pay him for an alternative to the duopoly.

See, at the end of the day, it's cute that you have the perfectly tuned harness, but that also means whatever time you spent tuning your harness, reading up on Pi, spending tokens on your custom plugins -- all of that time and resources could have been used just building something useful.


People use Replit to build websites too, and some of them might scratch enough of a need to make money this way. So what? Is this what I should be mightily impressed with? That some random dude vibe coded some slopware which he was able to convince some random others to pay him for? I'm personally more interested and impressed by brilliant technical achievements, even if less monetizable, than some hustle or another in some industry niche which only ever attracted the interest of two legacy players. This is Hacker News, not Hustler News after all.

> Something that is overlooked: the mainstream harnesses have a huge advantage in telemetry and datapoints to use to improve the harness. They have internal teams building the tooling. They have tight integration built-in with their own backends (e.g. optimizing for caching).

> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.

Do I want to become completely dependent on the pricy pay-as-you-go tool? In the long run that will make me powerless.


You'll be dependent on it whether or not you use the main harnesses. You pay for the model. The frontier models will likely always be better than the open source ones.

> The frontier models will likely always be better than the open source ones.

Their lead is only a few months, and shrinking.

Local is the future.


> Are you tinkering? Or trying to build something useful? If you're trying to build something useful, use a tool.

I don't think that you really get what this new era of software is about otherwise you would understand why the experienced are spending time tinkering on the so called harness (like openclaw did)


OpenClaw is far from useful. Aside from the creator trading the fame for a job at OpenAI, it's hard to see how it's transformed anything.

> It is not worth switching to Pi except as a hobbyist.

Permit me to paraphrase slightly. "It is not worth switching to Linux except as a hobbyist. Something that is overlooked: the mainstream OSs have a huge advantage ....".

You are in good company. In 1999, Bill Gates confidently dismissed Linux as a threat, arguing it lacked the central control, features, and graphical interface needed to compete in the commercial market.

Back to the article, quoting:

> Pi might be built with Pi, but we’re quite far off today from where Bun and OpenClaw already are: fully detached, automated software engineering.

Please don't call it software engineering. I've been programming for 40 years, and most of that time had to put up with the derision from the other engineering disciplines: "If civil engineering built things like software engineers, the first woodpecker that came along would destroy civilisation". It hurt because it was true. It's still often true for things like web pages, but for the things I use like Linux and vim, it hasn't been true for a long, long while. We have finally mastered how to repeatedly build solid, reliable software.

Which is why I'm an Anthropic refugee. Opus is definitely the best for coding, but claude-cli + bun is the most unreliable piece of crap I've had the misfortune to come across in a while. Sadly I can't afford their API pricing, so either my principles or Opus had to give. I went to pi and an open-source model. The difference between the top open-source models and Opus are noticeable, but not drastic, unlike the difference between pi and claude-cli.

pi has proved to be solid, fast, have a transparent design, and be customisable in the old Linux way ("do one thing, and do it well"). I pray that will never change.


And yet Pi has done a few things that were quite transformational. A lot of recent agentic libraries explicitly credit Pi for design ideas.

We’re so early in this technology phase, now is the time to tinker and explore. At one point that window will close.


Which design ideas are those? (Asking out of curiosity, happy pi user here!)

One example: earlier versions of my mlx-code's harness layer were largely a Python port/adaptation of Pi.

I mean, have you tried Pi? It's really good out of the box.

It's a mixed bag.

I've been working with Apache AGE (openCypher in Postgres) recently and found that left to its own device, the agent wrote terribly inefficient queries, even when given a test harness and instructions to examine the result of the query plan.

It just didn't seem to understand the graph traversal, even when given the graph schema and small snippets.

I ended up hand-writing the structural "skeleton" of the main query that I performance tuned to a certain extent and then handed it over to Codex to finish off. Once it had this skeleton to start from, it was able to do a much, much better job of writing this query.


    >  Slowly your brain just gets trained to mid thinking like an LLM
Regression to the mean.

I am doing a lot of the code reviews on my team and I can see that LLMs have a hard time with OOP (or are perhaps specifically guided to avoid) and writes a lot of `private static` utility functions. A lot of duplicated small utilities that can end up becoming a maintenance nightmare should the behavior need to be normalized/fixed. String key formatting, for example. JSON serialization behavior, another very common one. At a higher level, it needs very active guidance to search for existing code and re-use interface contracts via DI consistently (we have instructions and skills for this, but hit or miss on usage and adherence)

It generates very repetitive code and doesn't have the wits to refactor is in a way that is reusable, even in simple cases (basic JSON serialization).

It really dislikes to create object and type hierarchies on its own (e.g. move the repetitive serialization to a base class) and prefers to write one-offs. Works, but not very elegant; lots of duplication and touch points for regressions.

It also has a tendency to write more "verbose" solutions where sometimes simpler ones will work.


Cleaning up tech debt is a must with AI for a lot of these orphaned utilities and anti patterns.

My hope is eventually open source models get far enough along we just train the models on company specific code needs.

Until then it is a lot of wack a mole with skills, hooks, system prompts, and interweaving deterministic needs.


can you say which llms you're using? have you tried different ones and how were they?

Team is a mix; I'm personally using Codex, gpt-5.5 high fast + Claude Opus 4.6 (occasionally Sonnet 4.6).

Mix of CLI (Codex) and GH Copilot (if I want active line selection).

We have a set of custom skills and knowledge base as well.


Predictable yeah. We're far better to find the right overarching narrative of the architecture and the necessary intermediate layers of abstractions. LLMs will invent verbal structs that sound okay but not have the elegance of a senior OOP. Although once the good foundation is there, agents can be great at extending and maintaining features on it.

This sounds very much like a problem of the context, that should be solvable by having a file with instructions on how to do generic utilities somewhere in the code (e.g. AGENTS.md)

We have; but it's also not easily practical because there is some judgement involved and it's not really feasible to point out all of the edge cases (bloating the context).

What is not clear to me is whether this is inherent desirable behavior on the part of the agent or not. Why? Because for the agent, the code is more isolated and its immediate changes have a lower blast radius by internalizing some behavior (`private static`) versus touching a shared method or hierarchy.

I can see why the underlying models may be steered this way, but it creates a different kind of problem when things really should be shared.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: