- unwind ai
- Posts
- Web Agents Infrastructure for the Next Trillion Internet Users
Web Agents Infrastructure for the Next Trillion Internet Users
+ Google unifies model and agent with Interactions API
Today’s top AI Highlights:
& so much more!
Read time: 3 mins
AI Tutorial
Imagine uploading a photo of your outdated kitchen and instantly getting a photorealistic rendering of what it could look like after renovation, complete with budget breakdowns, timelines, and contractor recommendations. That's exactly what we're building today.
In this tutorial, you'll create a sophisticated multi-agent home renovation planner using Google's Agent Development Kit (ADK) and Gemini 2.5 Flash Image (aka Nano Banana).
It analyzes photos of your current space, understands your style preferences from inspiration images, and generates stunning visualizations of your renovated room while keeping your budget in mind.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
The next trillion internet users are AI agents that act on our behalf to book appointments, check inventory, compare pricing, place orders, extract data from sites, and more.
Local restaurants, businesses, services, government portals, doctors’ clinics - everyone has a website, but no APIs. All are a mess of HTML spaghetti with dynamic UIs, popups, heavy JS elements - made for human users, not AI agents.
Computer use agents like Claude CUA and Atlas work, but cost $$ per task in vision inference, and hallucinate when things get rough. They just aren’t reliable and scalable to build something real on.
Meet Mino, a web agent API to get your work done. Anywhere, at any scale. You just need to give it a goal in simple language. Mino treats web navigation as a learning problem: it uses AI to parse a website’s structure and identify elements on the first run, then that knowledge gets codified into deterministic execution paths. Subsequent runs happen in seconds at pennies per execution.
Google uses it for competitive intelligence, DoorDash for supplier monitoring, ClassPass for pricing aggregation across 50,000+ fitness studios. And now this infrastructure is available to any developer who needs reliable access to the deep web of sites without APIs.
Key Highlights:
Natural Language Instructions - Describe what you want in simple language: "Find earliest appointment for general illness, use ZIP 94086 if required, return clinic name and booking URL." Mino figures out the navigation sequence, form fields, and extraction logic automatically on any website.
Structured Output Guaranteed - Returns JSON every time, not probabilistic text that needs parsing. Navigate booking flows, fill forms, extract nested data, and get consistent results whether you're processing one URL or a hundreds.
Economics That Actually Work - After initial AI-powered learning, repeated executions cost pennies instead of dollars. This makes it viable for multiple jobs continuously and at scale.
Free to try - Direct API for programmatic integration, Platform UI for visual testing and batch processing, and MCP server to use with any MCP-compatible client. 50 completed runs are on the house, scale based on actual needs.
The Future of AI in Marketing. Your Shortcut to Smarter, Faster Marketing.
This guide distills 10 AI strategies from industry leaders that are transforming marketing.
Learn how HubSpot's engineering team achieved 15-20% productivity gains with AI
Learn how AI-driven emails achieved 94% higher conversion rates
Discover 7 ways to enhance your marketing strategy with AI.
Google just dropped a game-changer for agent developers: a single API that handles both models and agents, complete with their first built-in agent that can autonomously execute multi-day research tasks.
The Interactions API fundamentally rethinks how you build agentic applications by providing server-side state management, background execution, and native support for complex conversation patterns with interleaved thoughts and tool calls - all through one RESTful endpoint.
You now get unified access to Gemini 3 Pro and the new Gemini Deep Research agent, which can synthesize comprehensive reports from web data and your documents. The API also brings remote MCP tool support, letting models directly call Model Context Protocol servers as tools. Plus, it seamlessly integrates with the ADK framework and Agent2Agent (A2A) protocol, so you can either use it as your inference engine or treat Deep Research as a remote agent in your existing multi-agent systems.
Key Highlights:
Unified Model + Agent Interface - Switch between raw model inference (
model="gemini-3-pro-preview") and built-in agents (agent="deep-research-pro-preview-12-2025") using the same/interactionsendpoint.Gemini Deep Research SOTA Performance - The new Deep Research agent achieves 46.4% on HLE, 66.1% on DeepSearchQA, and 59.2% on BrowseComp by using Gemini 3 Pro as its reasoning core with multi-step reinforcement learning for autonomous web navigation and synthesis.
Background Task Execution - You can offload long-running research loops to the server and disconnect your client without maintaining active connections, avoiding timeout issues in multi-step agent workflows.
Native ADK & A2A Integration - Use the Interactions API as your inference engine within ADK agents, or treat Deep Research as a remote A2A agent for transparent integration into existing multi-agent systems without refactoring code.
Quick Bites
OpenAI GPT 5.2 for professional work and long-running agents
OpenAI released GPT-5.2, their new flagship model series (Instant, Thinking, and Pro) optimized for professional workflows and long-running agents. The model shows substantial gains across document creation, coding, and multi-step tasks, beating industry professionals on 71% of knowledge work tasks in the GDPval benchmark. It also showed 30% fewer hallucinations than GPT-5.1. Available now across ChatGPT tiers, API, and Codex, with GPT-5.1 continuing without deprecation plans. Pricing starts at $1.75/1M input tokens with 90% discount on cached inputs.
Google’s fully-managed remote MCP servers for Google and Cloud services
Google just launched fully-managed, remote MCP servers that connect AI agents directly to Google and Google Cloud services. No more wrestling with local installations or fragile deployments. The initial rollout includes BigQuery for querying enterprise data in-place, Google Maps for grounding agents in real-world location data, and GCE/GKE for autonomous infrastructure management. Plus, Apigee integration to expose your own APIs as discoverable tools for agents. More services (Cloud Run, AlloyDB, Spanner, SecOps) are rolling out soon.
Gemini Live API with the new Native Audio model, now in GA
Google's Gemini Live API is now available on Vertex AI, letting you build voice agents that actually handle real-time conversations - interruptions, tone detection, and visual context all at once. Companies like Shopify and UWM are already running production agents on it (UWM's generated 14,000+ loans through theirs 🤯). It's built on the Gemini 2.5 Flash Native Audio model and comes with the enterprise infrastructure you'd expect from Vertex.
Open-source AI model and framework for Android-use
China’s Z.ai open-sourced AutoGLM-Phone-9B, a vision-language model that controls Android phones through natural language commands. Tell it "open Taobao and search for running shoes" and it parses intent, reads the screen, and taps through the workflow. Beyond the model weights, they open-sourced the full Phone Agent framework with ADB integration, action planning, and support for local or cloud deployment.
Tools of the Trade
llamafile - Packages LLMs into single-file executables that run locally across operating systems without installation, combining llama.cpp with Cosmopolitan Libc. It includes pre-built models you can download and run immediately, plus tools for creating your own llamafiles from any compatible model.
OpenSkills - A CLI tool that brings Claude Code's Skills to other coding agents like Cursor, Windsurf, and Aider. It installs Skills from GitHub repos, syncs them to AGENTS.md, and enables progressive disclosure instead of Claude Code's native tool invocation.
toMCP - Converts any website into an MCP server by prepending tomcp.org/ to the URL, stripping out navigation and ads to deliver clean markdown that uses fewer tokens than raw HTML scraping.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
Hot Takes
Is Sergey Brin’s return the real reason Google went from “way behind” to “easily #1” across AI domains in a year?
Google’s best moves imo were: built TPUs in 2013; acquired DeepMind for $400M; bought Noam back.
Most importantly, the competition from OpenAI awakened the monster.
It feels pretty obvious at this point that someone’s going to make billions building a social app that’s just for friends, no AI slop, no brainrot, calm design, chronological feed and no concept of followers.
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉





Reply