Mistral Medium 3.5 ships remote agents — a quick note on why we won't route production through them
Mistral Medium 3.5 added server-side tool execution. Useful for prototypes; not where we put production traffic. A short field note on the trade-off.
Mistral dropped Medium 3.5 yesterday, a mid-size model comparable on code tasks to Claude Sonnet 3.5, with one feature most coverage glossed over: remote agents that execute tool calls server-side. Send a prompt, Mistral runs the tools, you get the final answer. One API call instead of the usual prompt-tool-execute-feed-back loop. Python, web search, file access, and custom function calling, all sandboxed on Mistral's infrastructure.
For prototypes and one-off research tasks, this is great. For production traffic, we won't be using it. Three reasons.
Observability. Remote execution is a black box. When a tool fails, we want to know which payload triggered it and at what time, with full request and response. We log every tool call locally and route it through LangSmith and Helicone. Remote agents give you an error message if you're lucky.
State. Our agents share context across a single Postgres database. Remote execution means shipping that state back and forth or maintaining dual truth, neither of which is appealing. We'd rather query pgvector directly through the orchestration layer.
Tool surface. Mistral covers Python and web search. A typical deployment for us calls Twilio, ElevenLabs, Stripe, QuickBooks, HubSpot, Airtable, Make, n8n webhooks, and dozens of custom endpoints per client. Remote execution would mean Mistral supporting every SaaS we touch, or building a translation layer that defeats the purpose.
Where remote agents do make sense: one-off research tasks, edge-deployed agents in restricted runtimes (Cloudflare Workers, mobile shells) where you can't spin up Python locally.
Everything else, orchestrate locally. Remote agents make prototyping faster. They don't make production easier. They move the complexity from your code to the model provider's runtime, and when that runtime fails, you debug without logs.