All articlesFrontier

Google I/O 2026: Gemini 3.5 Flash, Omni, and Spark — here's what shipped

Google shipped Gemini 3.5 Flash (straight to GA), a multimodal model called Omni, and a 24/7 cloud agent named Spark. They also overhauled pricing to a three-tier compute model. Here's what the announcements mean for production systems.

May 20, 2026 3 min read
googlegeminimodel-releasescloud-agents

Google shipped three things at I/O 2026 yesterday: a new model, a multimodal model, and a 24/7 cloud agent. They also overhauled the pricing structure for consumer subscriptions. No previews this time — Gemini 3.5 Flash went straight to general availability.

Here's what actually changed.

Gemini 3.5 Flash is now the default across Google products

Gemini 3.5 Flash skipped the -preview tag. Google is using it immediately in Search (AI Mode), the Gemini app, Gmail, Docs, Workspace, and for developers via AI Studio and Vertex AI. Simon Willison noted the pricing shift: input tokens cost more than 3.5 Flash Preview ($0.15/1M vs $0.075/1M for prompts under 128K), but it's still cheaper than GPT-4.5 or Claude Opus 4.7.

The model supports up to 2M token context windows. Google claims quality improvements over Flash Preview but hasn't published detailed benchmarks yet. For production systems that already route through Vertex, this is a straightforward swap — same API surface, different model ID.

Willison also released llm-gemini 0.32 the same day, adding support for the new model in his CLI tool.

Gemini Omni handles video natively

Gemini Omni is Google's multimodal model. It processes video, audio, and text in a single pass without transcription or frame extraction steps. The name references the "NanoBanana" internal research line Google's been running since late 2025.

This matters for applications that need real-time video understanding — security camera feeds, warehouse robotics, telehealth consults. Most current multimodal setups still treat video as a sequence of frames or run separate audio extraction. Omni collapses that pipeline.

No public pricing yet. Google said "coming soon" for API access via Vertex.

Spark is a 24/7 cloud agent

Gemini Spark is Google's answer to the question: what if your assistant kept working when you closed the app?

It runs continuously in Google Cloud. You assign it goals ("monitor my GitHub issues and summarize them weekly", "track my Calendar and reschedule conflicts"), and it executes in the background. The demo at I/O showed Spark booking travel, tracking package deliveries, and managing a shared family calendar without manual prompts.

This is the same pattern Anthropic is exploring with Claude Managed Agents, but Google integrated it directly into the consumer Gemini app. Spark uses tool-calling APIs to connect to Gmail, Calendar, Drive, and third-party services via OAuth.

For VioX clients, the interesting part is the infrastructure: Google is eating the hosting and orchestration cost in the subscription tiers. That's a different model than self-hosted agent loops we've been deploying on Render or Fly.

Pricing moved to three compute tiers

Google replaced daily prompt limits with a consumption-based compute model. Three tiers:

  • Gemini Basic ($7.99/month): includes 3.5 Flash access, limited Spark compute
  • Gemini Advanced ($19.99/month): adds Omni access, higher Spark compute allocation
  • Gemini Ultra ($99.99/month): priority access to all models, dedicated Spark compute pool

The shift from "X prompts per day" to "Y compute units per month" mirrors what OpenAI did with ChatGPT Pro and what Anthropic is testing in enterprise tiers. It decouples billing from prompt count and ties it to actual inference cost.

This makes more sense for agentic workloads. A Spark agent that runs 20 queries per hour all month consumes vastly more compute than a user who sends 50 long prompts in one day. The old model charged both the same.

What this means for production systems

If you're already routing through Vertex AI, Gemini 3.5 Flash is available now. The price increase (2× for short prompts) won't matter much unless you're running high-volume low-context queries — in which case you were probably already using 3.5 Flash Preview or a cheaper alternative anyway.

Omni is the one to watch for video-heavy applications. If your current stack transcribes audio separately or extracts frames for vision models, a single-pass multimodal API could cut latency and cost. Wait for public pricing before committing.

Spark is interesting infrastructure but it's consumer-only for now. Google hasn't announced an enterprise Spark API. For SMB clients, we're still deploying self-hosted loops with job queues and state machines. Spark proves the market exists; the API version will take time.

Google also announced Antigravity 2.0 (a code generation agent for Android) and updates to AI Studio, but those don't change much for backend agentic systems. The three big pieces — Flash, Omni, Spark — are the ones worth tracking.

/ 06 — Start hereOne business day response

Tell us what you'd like built.

Send us a paragraph about the workflow, phone line, or tool you want built. We'll reply within one business day with a one-page plan, a fixed price, and a delivery date you can put on a calendar.

  • 30-min scoping call, free
  • Written proposal within 48 hours
  • Fixed price before we start
  • Most builds delivered in 2–8 weeks