All articlesStack

The six-layer agentic stack we deploy for SMBs

Reasoning at the bottom, business outcomes at the top. The architecture inside every deployment, and why each layer earns its place.

Apr 29, 2026 3 min read
agenticarchitectureclaudelanggraph

A lot of what gets called "an AI agent" in 2026 is a CRUD app with a prompt taped on the side. It answers questions and forgets them. It doesn't reason across steps. It has no memory. It can't recover from a tool failure. When a customer asks something the prompt didn't anticipate, it invents.

Every system we put into production runs through six layers, top to bottom, with nothing glued on. Reasoning at the bottom, business outcomes at the top. This is the architecture inside every deployment, and the same one we use for bespoke builds.

01 — Reasoning

Frontier models with structured output and grounding. Claude Sonnet 4.6 for the production tier. Claude Opus 4.6 for tasks where the cost of being wrong dominates the cost of tokens. GPT-4o and Gemini for ensemble work where we want a second opinion. We fine-tune smaller models (Llama 3 8B, Phi-4) for high-volume narrow tasks the frontier doesn't need to touch.

The relevant rule here: never run on a single model. Production agents fail in model-specific ways, and a mono-model deployment is one outage from a customer escalation.

02 — Tool use

Function calling, code execution, browser, MCP servers. This is where most of the actual work happens. The agent isn't smart because it knows things; it's smart because it can do things. CRM tools, search tools, computer use, custom MCP servers we write per client.

The pattern that wins: every tool returns structured output the agent can reason about, plus a one-line natural-language summary the agent can quote back. Agents that have to parse JSON in their head waste tokens and lose context.

03 — Memory

Vector and episodic memory. RAG, semantic cache, audit log. We use pgvector inside the deployment Postgres for per-tenant scoped memory, Pinecone for high-volume cross-tenant retrieval, Redis for the semantic cache that catches 30 to 40% of repeated queries before they hit a model.

The five-layer memory model we use internally — raw transcripts, compressed summaries, extracted entities, learned facts, distilled wisdom — is the part that takes an agent from "answers questions" to "knows your business."

04 — Orchestration

Multi-agent graphs, retries, fallbacks, hand-off. LangGraph for directed-acyclic flows. AutoGen patterns for free-form negotiation. n8n as the workflow surface where ops can see what's running and intervene without writing code.

This is the layer that makes the system robust. A single agent calling a single tool fails 3 to 5% of the time in production. The model hallucinates, the tool times out, the input was malformed. Without orchestration with retries, fallbacks, and human hand-off, those failures become customer complaints. With it, they become invisible.

05 — Observability

Traces, evals, prompt versioning, cost telemetry. Every agent run produces a trace, every trace runs through evals, every prompt version is tagged and rollback-able. LangSmith and Helicone for the standard work, Arize for the harder evaluation problems, custom evals for anything client-specific.

The rule we don't break: no agent goes to production without an eval suite. Not "we'll add it later." Day zero.

06 — Cloud infrastructure

Edge-native serverless. Vercel for the public surfaces, Cloudflare Workers for low-latency edge compute, AWS for the heavier orchestration, Supabase for Postgres, auth, and RLS. Edge functions, queues, vector stores, durable execution.

Why edge-native? Latency. A typical agent run touches three or four tools and ends with a model call. If every hop costs 100ms of network round-trip, the user feels it. Edge brings the orchestration close to the model, the cache close to the user, and the data close to where it's served.

What it does in practice

When we deploy this for an SMB, every one of the eight pre-built agents runs through this stack. Same reasoning layer, same memory model, same orchestration patterns, same observability surface. We don't rebuild the substrate every engagement. We build new agents on the substrate that already works, and they inherit the reliability of the layers beneath them.

If you're standing up an agent and you can only invest in one layer this week, invest in observability. Without traces and evals, you don't know if your agent is getting better or worse. You'll just feel it when complaints arrive. Once you can see what your agent is doing wrong, every other layer becomes easier to fix.

/ 06 — Start hereOne business day response

Tell us what you'd like built.

Send us a paragraph about the workflow, phone line, or tool you want built. We'll reply within one business day with a one-page plan, a fixed price, and a delivery date you can put on a calendar.

  • 30-min scoping call, free
  • Written proposal within 48 hours
  • Fixed price before we start
  • Most builds delivered in 2–8 weeks