Field notes from inside the builds.
What we're learning about voice agents, evals, multi-tenant platforms, and the infrastructure that holds them together.
Cloudflare just shipped DNS routing to private origins — no public IPs, no extra connectors
Cloudflare's new Application Services for Private Origins routes public hostnames to private IPs over existing tunnels. No connector software, no exposed IPs.
Anthropic's Fable 5 ships with a clause letting it degrade service to competitors — here's what that means
Anthropic's 319-page system card for Fable 5 includes a clause allowing the model to sabotage competitors building recursive self-improvement systems. The policy is buried in safety documentation and raises questions about API reliability.
Latent.space just dropped FrontierCode, a benchmark for code quality over slop — here's what it measures
Latent.space launched FrontierCode, a new benchmark designed to measure code quality instead of pass-rate slop. We break down what it tests and why it matters for production agents.
Uber capped Claude Code usage after blowing four months of AI budget — here's what that means for enterprise rollout
Uber burned through its 2026 AI budget in four months and capped Claude Code access. The story isn't about Uber's failure — it's about what happens when you budget for 2025 usage patterns and ship 2026 agents.
Cloudflare cut core server boot time from 4 hours to minutes by fixing UEFI timeouts — here's the diff
Cloudflare traced 4-hour server reboots to UEFI timeout loops and iPXE automation issues, then fixed both. The lesson matters for anyone thinking about infrastructure at agent scale.
SQLite shipped an AGENTS.md file and curl is drowning in AI-assisted security reports — here's what it means for agentic infrastructure
SQLite added an AGENTS.md to guide AI agents through its codebase. Meanwhile curl is fielding 5× more security reports than 2024, all AI-assisted. The infrastructure layer is adapting.
The Pope just published an encyclical on AI ethics — and it reads like Anthropic's Constitutional AI doc
Pope Leo XIV dropped Magnifica Humanitas this morning — 40 pages on AI safety that mirror Constitutional AI's core principles. Here's what production teams should actually know.
Google I/O 2026: Gemini 3.5 Flash, Omni, and Spark — here's what shipped
Google shipped Gemini 3.5 Flash (straight to GA), a multimodal Omni model, and a 24/7 cloud agent named Spark, alongside a new three-tier pricing model. What it means for production systems.
Abridge just hit 100M doctor visits and cut prior auth from days to minutes — here's what production healthcare AI actually looks like
Abridge processed 100M patient visits, saves clinicians 10-20 hours per week, and turned prior authorization from a 3-day ordeal into minutes. Real numbers from a real deployment.
GitLab just announced a 30% country reduction for "the agentic era" — here's what the math actually says
GitLab's "Act 2" announcement pairs workforce cuts with agentic-era strategy claims. We ran the numbers on what coding agents actually change about distributed teams.
Mozilla used a Claude preview to harden Firefox. The numbers are worth looking at.
Mozilla audited Firefox's C++ codebase with a preview Claude model. The reported precision rate and the speed of the shift in maintainer sentiment are the parts worth paying attention to.
Mozilla used Claude Mythos Preview to find hundreds of Firefox vulnerabilities — here's what changed
Mozilla got early access to Claude Mythos Preview and used it to find hundreds of real Firefox vulnerabilities — a clear data point on the gap between AI slop and production security tooling.
Anthropic's Claude Code team just published a case for HTML over Markdown — here's why it matters for production tooling
Thariq Shihipar (Claude Code team) argues HTML beats Markdown for structured LLM output. We've been doing this in VioX OS for six months. Here's the production reasoning.
Versioned filesystems for agent sandboxes: a quick note on Tilde.run
Tilde.run posted a sandbox environment with a versioned, transaction-style filesystem aimed at agents. It's a small piece of infrastructure that addresses a real production problem.
Anthropic launched a finance-agent suite. What does it actually mean?
Anthropic released a suite of finance-specific agents for investment banks, asset managers, and insurers. A few questions about what that signals — for the labs, for buyers, and for vertical SaaS.
OpenAI's voice latency write-up: a four-layer read for production deployments
OpenAI published a deep dive on how they keep Realtime API latency low. Here's the four-layer read, plus what we've found running voice agents on a different stack.
Codex /goal, the OpenClaw drama, and what coding agents look like now
Codex CLI shipped a persistent goal loop. Claude Code is reportedly fingerprinting commit history for competitor mentions. Two stories, one week, practical takeaways for production coding agents.
Cloudflare just made agents first-class customers
Cloudflare now lets agents create their own accounts, buy domains, and deploy code via Stripe Projects. A look at what changes for multi-agent systems and what's still missing.
The six-layer agentic stack we deploy for SMBs
Reasoning at the bottom, business outcomes at the top. The architecture inside every deployment, and why each layer earns its place.
OpenAI on AWS Bedrock, one day after the Microsoft split
Microsoft and OpenAI dissolved exclusivity; a day later AWS announced OpenAI models on Bedrock plus a jointly-built managed agent service. What it means for multi-cloud agentic deployments.
Mistral Medium 3.5 ships remote agents — a quick note on why we won't route production through them
Mistral Medium 3.5 added server-side tool execution. Useful for prototypes; not where we put production traffic. A short field note on the trade-off.
Evals on day zero
An agent without evals is a complaint waiting to happen. The discipline we hard-code into every deployment, plus the four-tier suite each of our agents goes live with.
Migrating Goldie from Retell to ElevenLabs in four days
The catering concierge agent for Golden Plate ran on Retell. We ported it to ElevenLabs ConvAI in four working days. What changed, what broke, and the playbook we'll use next time.
A Sunday email from the workshop.
One email per week with what we built, what broke, and what we read. No spam, unsubscribe in one click.
New articles daily · RSS
Tell us what you'd like built.
Send us a paragraph about the workflow, phone line, or tool you want built. We'll reply within one business day with a one-page plan, a fixed price, and a delivery date you can put on a calendar.
- 30-min scoping call, free
- Written proposal within 48 hours
- Fixed price before we start
- Most builds delivered in 2–8 weeks