- unwind ai
- Posts
- Gemini CLI Extensions Combine MCP with Context Engineering
Gemini CLI Extensions Combine MCP with Context Engineering
+ xAI's free video gen model, OpenAI's Agent Builder is a Workflow builder
Today’s top AI Highlights:
& so much more!
Read time: 3 mins
AI Tutorial
Imagine uploading a photo of your outdated kitchen and instantly getting a photorealistic rendering of what it could look like after renovation, complete with budget breakdowns, timelines, and contractor recommendations. That's exactly what we're building today.
In this tutorial, you'll create a sophisticated multi-agent home renovation planner using Google's Agent Development Kit (ADK) and Gemini 2.5 Flash Image (aka Nano Banana).
It analyzes photos of your current space, understands your style preferences from inspiration images, and generates stunning visualizations of your renovated room while keeping your budget in mind.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Building agents that work across extended time horizons exposes a hard truth: context windows are finite, and attention degrades as they fill up. This creates an "attention budget" problem that Anthropic's engineering team has spent serious time solving.
Their framework shifts thinking from "how do I write this prompt" to "what's the minimal set of high-signal tokens that maximizes my desired outcome."
This new blog on Effective Context Engineering might change how you build agents that are ready for production. It's about curating what information enters the model at each inference step, especially as your agent loops through multiple turns and accumulates tool outputs, message history, and external data. The challenge scales with complexity: long-horizon tasks like codebase migrations or multi-hour research sessions demand sophisticated strategies to maintain coherence without drowning your agent in irrelevant information.
Here are some key takeaways and techniques you can apply right away:
System prompts need the "right altitude" - Avoid two extremes: hardcoding brittle if-else logic that creates fragility, or providing vague guidance that assumes shared context. The sweet spot is specific enough to guide behavior but flexible enough to let the model use its intelligence.
Just-in-time context beats pre-loading everything - Instead of embedding-based retrieval that front-loads all data, maintain lightweight identifiers (file paths, queries, links) and let agents dynamically load what they need. Claude Code uses this to analyze large databases without ever loading full datasets into context.
Long-horizon tasks require specialized techniques -
Compaction: Summarize conversation contents nearing the window limit and reinitialize with compressed essentials like architectural decisions and critical details, discarding redundant outputs.
Note-taking: Agents write persistent notes outside the context window (like maintaining a NOTES.md file or a to-do list) that get pulled back when needed.
Sub-agent architectures: Split complex work across focused specialists that explore extensively using tens of thousands of tokens but return only condensed summaries (1,000-2,000 tokens) to the main agent.
Tool design directly impacts context efficiency - Bloated tool sets with overlapping functionality waste tokens and create ambiguous decision points. Each tool should be self-contained with descriptive, unambiguous parameters. If you can't definitively say which tool fits a situation, your agent won't either.
Read the full breakdown from Anthropic's Applied AI team to see how these principles play out in production systems like Claude Code.
Google just dropped extensions for Gemini CLI, and they're nothing like the integrations you're used to.
These aren't just API wrappers or simple MCP server connections. Extensions bundle MCP servers, context files, and custom commands into a single package that teaches Gemini how to use any tool, giving the AI an actual playbook instead of just raw access.
The difference is intelligence: while an MCP server provides the connection, an extension layer is the understanding of how to use that connection effectively from the first command.
This is also completely different from AI agent architectures like Claude Code's subagents. Extensions are self-contained instruction sets that make tools immediately usable, not autonomous decision-making entities. The ecosystem already includes extensions from Figma, Stripe, Elastic, Postman, and Snyk, plus Google's own suite covering everything from Cloud Run deployments to Firebase management.
Key Highlights:
Layered Intelligence Over Tools - Extensions wrap MCP servers with context files and custom commands, creating a coherent instruction set that makes tools work intelligently out of the box, rather than requiring you to figure out the right prompts and workflows yourself.
Pre-Built Ecosystem Access - Launch partners include Figma for design-to-code workflows, Stripe for payment API interactions, Elastic for search and analytics, Postman for API management, Snyk for security scanning, plus Google extensions for Cloud Run, GKE, Firebase, Flutter, and Chrome DevTools.
Simple Package Architecture - Extensions can combine any mix of MCP servers, context files like GEMINI.md, excluded tools for disabling defaults, and custom slash commands, all installable via a single GitHub URL or local path.
Subagents Coming Soon - Subagents are coming soon to handle more complex multi-step reasoning, and the extension framework is designed to support them.
Marketplace with 90+ Extensions - The extension gallery already hosts over 90 integrations spanning cloud infrastructure, security, design tools, and data platforms, all installable with a single command:
gemini extensions install <GitHub-URL-or-local-path>
.
Quick Bites
OpenAI’s Agent Builder is a Workflow builder (which we didn’t need)
We watched OpenAI launch AgentKit release on DevDay with a familiar feeling: here we go again with another visual workflow builder. LangChain's Harrison Chase published a sharp take on this. He thinks the entire category is headed for obsolescence, and the arguments make complete sense:
Visual builders are not as "low barrier" as promised. They are still too complex for average non-technical users
For simple workflows, simple AI agents (LLM + instructions + tools) are enough, which non-technical users can easily build with no-code tools
Once workflows hit a certain complexity threshold, the visual interface becomes unmanageable, and you need code anyway
As AI models get smarter and code generation costs approach zero, this “middle ground” for these visual tools will narrow, not expand. The real long-term value will not be in another n8n clone, it's in making no-code agent creation genuinely simple and making models write better agentic code.
Plan Mode in Cursor to run AI agents autonomously for longer
Cursor has rolled out Plan Mode, letting its AI agent write detailed, editable plans before tackling complex coding tasks. Press Shift+Tab to have it research your codebase, ask clarifying questions, and generate a markdown plan you can tweak inline before execution. The feature activates automatically for complex prompts and can save plans directly to your repo.
xAI releases Sora 2-like audio-video model for free
xAI just dropped Imagine v0.9, their upgraded video generation model that now handles native audio-video creation, for free on all their platforms! The model can create videos with immersive sounds, dialogues, and even rhythms that completely match the video, along with emotive characters and dynamic camera effects in a single video for more effective storytelling. Our take: it's fast, much faster than Sora 2, and delivers quality that approaches (though doesn't quite match) Veo 3 and Sora 2. But the fact that xAI is offering this completely free (at least for now) is pretty wild.
Google and Amazon release their Enterprise AI platforms literally hours apart
Google Cloud just launched Gemini Enterprise, bringing together its fragmented AI offerings into one single platform. The offering bundles Gemini models with a no-code agent builder (previously Agentscope), pre-built specialized agents, and enterprise data connectors across both Google Workspace and Microsoft 365 along with business data platforms like SAP, all managed through a single governance layer. They're also pushing open standards for how agents talk to each other and handle payments, trying to shape the "agent economy."
Two hours after Google's Gemini Enterprise announcement, AWS dropped Amazon Quick Suite. Clearly, nobody wanted to let the other have the news cycle. Quick Suite is their enterprise AI agent that connects to company data across wikis, S3, Salesforce, Slack, and 1,000+ apps via MCP integrations. The suite includes Quick Research (an agent for deep-dive analysis with cited sources from 200+ news outlets), Quick Flows (automated workflows from simple prompts), and Quick Automate for multi-system orchestrations. With both Google and Amazon now claiming to be your "single front door" for workplace AI, great for us to get multiple front doors to choose from!
Tools of the Trade
LlamaFarm - Open-source framework that lets you define an AI system (models, RAG, agents, data) in YAML and run it anywhere (laptop, cloud, or edge), while it takes care of orchestration across environments. It leans into many small, domain-tuned models + RAG so you don’t need a giant all-purpose model, and your system stays up to date as data changes. Watch this short demo.
ElevenLabs UI - An open-source component library by ElevenLabs with 22 React components specifically designed for multimodal AI agents and audio applications, including voice interfaces, transcription, and audio playback. Built on shadcn/ui and distributed under MIT license.
Sora MCP - Wraps OpenAI's Sora 2 API, letting you generate and remix videos through Claude Desktop or other MCP-compatible clients. It handles video creation, status checking, downloading, and remixing via stdio or HTTP transport.
GroundCite - Open-source Python library that wraps Google Gemini's API to validate and filter citations, fixing issues with broken URLs, irrelevant sources, and missing citations in structured outputs. It uses multiple agents to enforce domain filtering, validate citation relevance, and ensure URLs actually support the claims made.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
Hot Takes
The difference between DeepMind and OpenAI is that DeepMind is interested in science and OpenAI is interested in hacking.
SSI strategy of not releasing a product is probably a good one. The minute one releases a product, one will be dragged into so fierce competition with OAI, gemini, ... that the original goal will be forgotten.
Maybe it would have been wiser for Anthropic to never release Claude and focus on safety full time
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply