All articlesField notes

Codex /goal, the OpenClaw drama, and what coding agents look like now

Codex CLI shipped a persistent goal loop. Claude Code is reportedly fingerprinting commit history for competitor mentions. Two stories, one week, practical takeaways for production coding agents.

May 1, 2026 3 min read
coding-agentscodexclaude-codetool-use

OpenAI shipped Codex CLI 0.128.0 yesterday with a /goal command, which is their take on the Ralph loop: you set an objective, Codex keeps looping until it judges the goal complete or burns its token budget. The same week, Claude Code is allegedly refusing requests when commit histories mention "OpenClaw," apparently Anthropic's internal codename for an in-progress competing coding agent. The HN thread had over a thousand points and six hundred angry comments.

Two stories, same week, same direction of travel: coding agents are escaping containment.

So what is /goal, really?

A persistent loop primitive. You give Codex an objective ("refactor the auth module to use Clerk", "add Postgres connection pooling with pgBouncer"), and it executes tool calls until it self-evaluates done. The implementation is mostly in codex-rs, per Simon Willison's writeup. Token budget is configurable. There's no human approval between steps unless you explicitly gate it.

The pattern itself is not new. The Ralph loop (named for Geoffrey Huntley's original PoC) has been kicking around for eighteen months. What's new: OpenAI productized it and shipped it to a couple million developers. Persistent goal loops just became the default expectation for what a coding agent does.

And the OpenClaw thing?

Worse, probably. The story broke via @theo on Twitter. Multiple users reported Claude Code refusing coding requests when the repo's commit history mentioned "OpenClaw." Some saw price increases. Others got vague "policy violation" messages. OpenClaw is reportedly Anthropic's internal name for a Claude-powered coding agent positioned against Codex, Cursor, and Windsurf. Anthropic hasn't confirmed the product publicly, but the VS Code extension is clearly looking at repo content to make routing decisions.

Six hundred HN comments, mostly furious. The core complaint is reasonable: you don't get to train on open-source code and then selectively refuse paying customers based on which other tools they're evaluating. Claude Code is a paid product. If you're charging, you cannot also be playing model police on commit messages. Pick one.

What we actually do about all this

We run Claude-native agents in production across a stack of integrations. Watching this week's signals:

The most important habit is not running on a single model. We've been saying it for a while; this week is another argument for it. Codex, Claude, Gemini, GPT-4o — you want fallback routing. If a provider starts fingerprinting your stack or charging extra for competitor mentions, you swap the underlying model in a few lines of LangGraph config. Our internal routing uses different models per agent role: GPT-4o for outbound sales, Claude Sonnet for billing-and-reconciliation work, Gemini for HR integrations. The split exists because we need production resilience, not because we're a fan club.

Goal loops also need evals, badly. The Ralph loop is powerful and dangerous in equal measure. A /goal will burn 200K tokens trying to "fix the build" if your eval is broken. We watched one of our voice agents spend ninety minutes retrying a Twilio SMS integration because the eval was checking message.status === 'sent' and Twilio returns 'queued' first. Every agent we deploy ships with a test suite from day zero. If you're running autonomous loops, the eval needs to be more reliable than the loop itself.

Commit hygiene matters now, which is absurd to write but true. If you're building on a provider-locked agent, audit your commit messages. We've started running a pre-commit hook that strips competitor mentions from commit bodies. Not because we have anything to hide, but because we don't want a per-seat agent suddenly refusing service over a PR description from eight months ago.

And finally, self-hosting is the only durable pattern. Our agents run in our edge functions, our containers, our infrastructure. Not in a vendor's sandbox. We control the token budget, the routing, the logs. Coding agents that live inside vendor-controlled environments are convenience plays. They're fast to start with, and they're fragile. One policy change, one price hike, one competitor-gating scandal, and you're rewriting integration code at eleven at night.

Codex shipping /goal is the bigger story this week, even if the OpenClaw drama has more heat. The implementation isn't novel. The signal is: OpenAI just told several million developers that persistent goal loops are baseline expectations now. Multi-step tool orchestration, self-correction on tool failures, separate goal evaluation, hard token budgets, and eval coverage before the loop starts — the list of things a serious coding agent needs in 2026 is no longer optional.

We've delivered this stack across a few dozen production deployments since 2018. The pattern works. The vendors are productizing it. If you're still on single-turn agents, you're behind.

/ 06 — Start hereOne business day response

Tell us what you'd like built.

Send us a paragraph about the workflow, phone line, or tool you want built. We'll reply within one business day with a one-page plan, a fixed price, and a delivery date you can put on a calendar.

  • 30-min scoping call, free
  • Written proposal within 48 hours
  • Fixed price before we start
  • Most builds delivered in 2–8 weeks