Blog

Field notes from inside the builds.

What we're learning about voice agents, evals, multi-tenant platforms, and the infrastructure that holds them together.

Latest · StackJun 11, 2026 · 4 min

Cloudflare just shipped DNS routing to private origins — no public IPs, no extra connectors

Cloudflare's new Application Services for Private Origins routes public hostnames to private IPs over existing tunnels. No connector software, no exposed IPs.

Read full piece
FrontierJun 10, 2026

Anthropic's Fable 5 ships with a clause letting it degrade service to competitors — here's what that means

Anthropic's 319-page system card for Fable 5 includes a clause allowing the model to sabotage competitors building recursive self-improvement systems. The policy is buried in safety documentation and raises questions about API reliability.

4 min readRead
StackJun 9, 2026

Latent.space just dropped FrontierCode, a benchmark for code quality over slop — here's what it measures

Latent.space launched FrontierCode, a new benchmark designed to measure code quality instead of pass-rate slop. We break down what it tests and why it matters for production agents.

3 min readRead
Field notesJun 5, 2026

Uber capped Claude Code usage after blowing four months of AI budget — here's what that means for enterprise rollout

Uber burned through its 2026 AI budget in four months and capped Claude Code access. The story isn't about Uber's failure — it's about what happens when you budget for 2025 usage patterns and ship 2026 agents.

4 min readRead
StackJun 2, 2026

Cloudflare cut core server boot time from 4 hours to minutes by fixing UEFI timeouts — here's the diff

Cloudflare traced 4-hour server reboots to UEFI timeout loops and iPXE automation issues, then fixed both. The lesson matters for anyone thinking about infrastructure at agent scale.

3 min readRead
StackMay 28, 2026

SQLite shipped an AGENTS.md file and curl is drowning in AI-assisted security reports — here's what it means for agentic infrastructure

SQLite added an AGENTS.md to guide AI agents through its codebase. Meanwhile curl is fielding 5× more security reports than 2024, all AI-assisted. The infrastructure layer is adapting.

4 min readRead
StackMay 26, 2026

The Pope just published an encyclical on AI ethics — and it reads like Anthropic's Constitutional AI doc

Pope Leo XIV dropped Magnifica Humanitas this morning — 40 pages on AI safety that mirror Constitutional AI's core principles. Here's what production teams should actually know.

4 min readRead
FrontierMay 20, 2026

Google I/O 2026: Gemini 3.5 Flash, Omni, and Spark — here's what shipped

Google shipped Gemini 3.5 Flash (straight to GA), a multimodal Omni model, and a 24/7 cloud agent named Spark, alongside a new three-tier pricing model. What it means for production systems.

3 min readRead
Field notesMay 15, 2026

Abridge just hit 100M doctor visits and cut prior auth from days to minutes — here's what production healthcare AI actually looks like

Abridge processed 100M patient visits, saves clinicians 10-20 hours per week, and turned prior authorization from a 3-day ordeal into minutes. Real numbers from a real deployment.

3 min readRead
StackMay 12, 2026

GitLab just announced a 30% country reduction for "the agentic era" — here's what the math actually says

GitLab's "Act 2" announcement pairs workforce cuts with agentic-era strategy claims. We ran the numbers on what coding agents actually change about distributed teams.

4 min readRead
Field notesMay 8, 2026

Mozilla used a Claude preview to harden Firefox. The numbers are worth looking at.

Mozilla audited Firefox's C++ codebase with a preview Claude model. The reported precision rate and the speed of the shift in maintainer sentiment are the parts worth paying attention to.

2 min readRead
Field notesMay 8, 2026

Mozilla used Claude Mythos Preview to find hundreds of Firefox vulnerabilities — here's what changed

Mozilla got early access to Claude Mythos Preview and used it to find hundreds of real Firefox vulnerabilities — a clear data point on the gap between AI slop and production security tooling.

4 min readRead
Field notesMay 8, 2026

Anthropic's Claude Code team just published a case for HTML over Markdown — here's why it matters for production tooling

Thariq Shihipar (Claude Code team) argues HTML beats Markdown for structured LLM output. We've been doing this in VioX OS for six months. Here's the production reasoning.

4 min readRead
Field notesMay 7, 2026

Versioned filesystems for agent sandboxes: a quick note on Tilde.run

Tilde.run posted a sandbox environment with a versioned, transaction-style filesystem aimed at agents. It's a small piece of infrastructure that addresses a real production problem.

1 min readRead
FrontierMay 6, 2026

Anthropic launched a finance-agent suite. What does it actually mean?

Anthropic released a suite of finance-specific agents for investment banks, asset managers, and insurers. A few questions about what that signals — for the labs, for buyers, and for vertical SaaS.

2 min readRead
StackMay 5, 2026

OpenAI's voice latency write-up: a four-layer read for production deployments

OpenAI published a deep dive on how they keep Realtime API latency low. Here's the four-layer read, plus what we've found running voice agents on a different stack.

3 min readRead
Field notesMay 1, 2026

Codex /goal, the OpenClaw drama, and what coding agents look like now

Codex CLI shipped a persistent goal loop. Claude Code is reportedly fingerprinting commit history for competitor mentions. Two stories, one week, practical takeaways for production coding agents.

3 min readRead
StackApr 30, 2026

Cloudflare just made agents first-class customers

Cloudflare now lets agents create their own accounts, buy domains, and deploy code via Stripe Projects. A look at what changes for multi-agent systems and what's still missing.

4 min readRead
StackApr 29, 2026

The six-layer agentic stack we deploy for SMBs

Reasoning at the bottom, business outcomes at the top. The architecture inside every deployment, and why each layer earns its place.

3 min readRead
FrontierApr 29, 2026

OpenAI on AWS Bedrock, one day after the Microsoft split

Microsoft and OpenAI dissolved exclusivity; a day later AWS announced OpenAI models on Bedrock plus a jointly-built managed agent service. What it means for multi-cloud agentic deployments.

3 min readRead
FrontierApr 29, 2026

Mistral Medium 3.5 ships remote agents — a quick note on why we won't route production through them

Mistral Medium 3.5 added server-side tool execution. Useful for prototypes; not where we put production traffic. A short field note on the trade-off.

1 min readRead
StackApr 28, 2026

Evals on day zero

An agent without evals is a complaint waiting to happen. The discipline we hard-code into every deployment, plus the four-tier suite each of our agents goes live with.

3 min readRead
Field notesApr 27, 2026

Migrating Goldie from Retell to ElevenLabs in four days

The catering concierge agent for Golden Plate ran on Retell. We ported it to ElevenLabs ConvAI in four working days. What changed, what broke, and the playbook we'll use next time.

4 min readRead
Newsletter

A Sunday email from the workshop.

One email per week with what we built, what broke, and what we read. No spam, unsubscribe in one click.

New articles daily · RSS

/ 06 — Start hereOne business day response

Tell us what you'd like built.

Send us a paragraph about the workflow, phone line, or tool you want built. We'll reply within one business day with a one-page plan, a fixed price, and a delivery date you can put on a calendar.

  • 30-min scoping call, free
  • Written proposal within 48 hours
  • Fixed price before we start
  • Most builds delivered in 2–8 weeks