Ask HN: How are you orchestrating multi-agent AI workflows in production?

hirewilliam · 2026-04-08T05:33:15 1775626395

From a production sales context specifically, the orchestration question that matters most is: how do you handle state across a multi-turn conversation with a real human who might reply days apart?

The naive approach is stateless. Each reply gets processed independently. This breaks down fast when a prospect says "as I mentioned before" and the agent has no memory of what they mentioned before.

What has worked better: treating the entire conversation thread as the context window, not just the latest message. Every reply, every prior message, the research done on the prospect at the start, all of it gets passed through. The agent always knows where it is in the conversation and what has already been said.

The second problem is confidence calibration. Multi-agent systems in production need to know when to act autonomously and when to surface something for human review. In sales specifically, the cost of an agent saying something wrong to a real prospect is high. We err toward flagging ambiguous situations rather than guessing.

The pattern that has held up: agents own clearly bounded tasks end to end (research, draft, send, parse reply), with a thin orchestration layer that routes based on reply classification. Classification is the hardest part to get right and the most important to get right.

azmz · 2026-04-13T15:12:05 1776093125

The confidence calibration point is spot on. We solve this in Atmita (atmita.com) with three explicit modes per agent: full autonomy, self-judging when to pause, or everything staged for human review. Lets you tune trust per agent instead of applying one policy across the board. It's cloud-native, not built on OpenClaw, with an API to control everything and 1000+ integrations out of the box.

swrly · 2026-04-08T18:51:51 1775674311

This is a great question. We handle it with session state that persists across turns — the agent's memory scope can be set to "agent" (persists across runs) vs "swirl" (one run only). For truly long-running conversations, we store context in agent memories with importance scoring, so the agent can recall relevant context days later without carrying the full history. It's not perfect yet but it works for most production patterns we've seen.

pablovarela · 2026-04-08T19:41:04 1775677264

I roll my own on Node.js. Each agent is an Express endpoint running in a V8 isolate — they communicate through a shared MongoDB layer where each agent reads/writes its state. Data passing is just JSON documents with a pipeline ID linking them. For orchestration I use a simple coordinator endpoint that chains the agents sequentially or fans out in parallel depending on the task.

Observability is the part most people underestimate. I log every agent run with input, output, token usage, and latency to a dedicated collection. Simple but it catches failures fast.

Chepko932 · 2026-04-08T09:04:32 1775639072

LangGraph, built my own orchestrator on top. Agents run as parallel workers (Claude Code, Codex CLI, Gemini CLI), each in its own git worktree. Agent-to-agent data flows through SQLite-structured JSON output per task, central coordinator reads and routes. Letting agents talk to each other directly was a mess. Biggest takeaway: don’t let agents pick their own subtasks. Define the task graph yourself: agents only handle the leaf nodes.

swrly · 2026-04-08T18:51:15 1775674275

Parallel workers in git worktrees is clever. We have a similar pattern with fan-out/fan-in — you can split an array across parallel agent executions and collect results. The SQLite-structured JSON output approach is interesting for coordination. We use template variables + scratchpad (Redis-backed shared state) for inter-agent data flow. Different trade-offs — yours gives you more control at the cost of more infrastructure.

go4horizon · 2026-04-06T17:19:18 1775495958

We have our own lightweight abstraction for running and managing agents, ironically managed by an agent.

How do you handle agent-to-agent data passing? - We do have a memory concept for the pipeline we are in

What does your observability look like for agent runs? - locally, we are using our own test abstraction and eval. For production, we are using https://www.wayfound.ai

Are you running agents on cron/webhooks or manual-only? - webhook and cron when needed

olegbk · 2026-04-08T07:42:20 1775634140

We've been running this pattern in production for a few weeks. The biggest pain wasn't orchestration, it was trust when agents delegate to agents they don't own. We ended up building reputation-based gating so a low-trust agent can't delegate upward. Happy to share specifics if useful.

murmansk · 2026-04-10T17:12:09 1775841129

Please, share! Especially, the trust part!

kathir05 · 2026-04-06T16:48:17 1775494097

We have been using AGNO framework for HuntYourTribe quite sometime. It is pretty much working out well for us. Minimalistic design for isolation, decoupling and control plane architecture.

swrly · 2026-04-06T17:13:48 1775495628

Interesting — I hadn't looked into AGNO closely. The isolation and control plane approach sounds solid. How do you handle observability? That's been one of the harder parts for us — knowing exactly which agent produced which output when something goes wrong in a multi-step pipeline.

Also curious if you're running agents on triggers (webhooks, cron) or mostly manual execution?

segmondy · 2026-04-08T13:10:19 1775653819

roll my own, there's absolute 0 framework out there that's good enough for serious work.