- unwind ai
- Posts
- AI Coding Agent with 2 Million Context
AI Coding Agent with 2 Million Context
PLUS: Run Gemma 3 27B locally, Gemini Browser Agent, HeyGen MCP Server
Today’s top AI Highlights:
Autonomous AI coding agent with 2 million token memory
Python toolkit for building and evaluating web AI agents
Run Gemma 3 27B locally on consumer-grade GPUs like NVIDIA RTX 3090
Generate avatar videos directly from Claude with HeyGen MCP server
Semantically-aware chunking and clustering for smarter RAG
& so much more!
Read time: 3 mins
AI Tutorial
Financial management is a deeply personal and context-sensitive domain where one-size-fits-all AI solutions fall short. Building truly helpful AI financial advisors requires understanding the interplay between budgeting, saving, and debt management as interconnected rather than isolated concerns.
A multi-agent system provides the perfect architecture for this approach, allowing us to craft specialized agents that collaborate rather than operate in silos, mirroring how human financial advisors actually work.
In this tutorial, we'll build a Multi-Agent Personal Financial Coach application using Google’s newly released Agent Development Kit (ADK) and the Gemini model. Our application will feature specialized agents for budget analysis, savings strategies, and debt reduction, working together to provide comprehensive financial advice. The system will offer actionable recommendations with interactive visualizations.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Plandex is a terminal-based AI coding agent that tackles large-scale development tasks spanning numerous files and steps. Unlike typical AI coding assistants that struggle with complex projects, Plandex manages up to 2 million tokens of context directly and indexes projects with 20+ million tokens using tree-sitter project maps.
It keeps AI-generated changes isolated in a cumulative diff review sandbox until they're ready to be applied, giving you complete control with the ability to roll back problematic changes. The tool supports models from Anthropic, OpenAI, Google, and open source providers, making it adaptable to different development needs.
Key Highlights:
Handles Complex Projects - Plandex uses smart context management with a 2M token effective window, loading only what's needed for each step. Its tree-sitter integration enables fast project mapping with syntax validation across 30+ programming languages, making it reliable even when working with dozens of large files simultaneously.
Adjustable Autonomy Levels - You control how much freedom Plandex has, ranging from fully autonomous operation (planning, implementing, debugging) to step-by-step oversight where you can review each modification. This flexibility makes it suitable for both routine tasks and critical system changes.
Production-Ready Workflow - Beyond code generation, Plandex includes automated debugging of terminal commands, browser applications (with Chrome installed), project-aware chat for ideation, and reliable file edits with syntax and logical validation. It maintains version control for every plan update, including branches for exploring multiple paths.
Developer-Friendly Experience - With a REPL mode featuring fuzzy auto-complete, CLI interface for scripting, one-line zero-dependency installation, and Docker support, Plandex integrates seamlessly into existing development environments. It works directly in your projects with optional Git integration for commit message generation.
Web AI agents are quickly becoming the new frontier for automating complex online tasks that previously required human intervention. Until now, there's been no consistent way to evaluate how well these agents actually perform in realistic scenarios - most existing benchmarks are too simplified to capture the true complexity of the web.
AGI SDK is a comprehensive toolkit for building and evaluating AI agents that interact with the web. Released by AGI Inc., it includes the REAL benchmark (Realistic Evaluations for Agents Leaderboard) - a fully-functional "mini-Internet" with high-fidelity clones of popular websites. This benchmark provides a standardized environment for testing autonomous web agents against real-world scenarios, with deterministic websites ensuring consistent evaluation across different models and frameworks.
Key Highlights:
Realistic Testing Environment - REAL features 11 sandbox replicas of popular websites including Amazon, LinkedIn, and Gmail clones, preserving both visual design and functionality. These static website clones create controlled conditions for fair performance comparisons while closely mimicking actual web interactions.
Performance Insights - Current benchmark results reveal significant gaps between closed and open-source models, with Claude-3.7-Sonnet-Thinking topping the leaderboard at 41.1% task completion. The benchmark exposes common failure modes like navigation dead ends and poor state verification across all tested agents.
Simple Implementation - You can install the SDK with a single pip command, evaluate their agents with just 3-5 lines of code, and submit results to the public leaderboard. The system supports both pre-configured and custom agent implementations.
Standardized Metrics - REAL evaluates agents on 112 practical tasks using binary outcome rewards for both information retrieval and action completion. Performance is measured as success rate across all tasks, with detailed breakdowns by website and task type.
Quick Bites
Anthropic has shared a guide to get the most out of Claude Code. It shows how you can feed it commands, style rules, and project quirks into every prompt, and how trimming the tool allowlist saves tokens without losing safety checks.
The post also details field‑tested workflows—test‑first loops, multi‑Claude reviews, and a permission‑free “safe YOLO” flag for quick lint or boilerplate fixes. If you code from the command line and want more precise agent help, this piece is worth a skim.
Google has released new Gemma 3 models optimized with Quantization-Aware Training (QAT), dramatically reducing memory requirements while maintaining high performance. The new quantized versions enable their largest 27B parameter model to run on consumer-grade GPUs like the NVIDIA RTX 3090, shrinking VRAM needs from 54GB to just 14.1GB while preserving quality. These models are now available through popular tools including Ollama, LM Studio, and MLX.
Google team released Gemini Browser Agent, a simple Python script that allows Gemini 2.5 Flash to interact with and control web browsers. This tool runs in either single query mode for specific tasks or interactive mode for ongoing browsing sessions, with customizable options for model selection and browser configuration. It’s a great straightforward project to have Gemini navigate websites, extract information, and respond to complex queries about web content.
Tools of the Trade
HeyGen MCP Server: Open-source HeyGen MCP server for any MCP Client like Claude Desktop, Cursor, or AI agents to use the HeyGen API to generate avatars and videos. It lets you create persistent, personalized AI conversations and easily switch between multi-user or multi-avatar scenarios, right from Claude.
n8nChat: Generate and edit n8n workflows from scratch based on natural language commands. It integrates directly into the n8n editor. Like Cursor but for n8n, it can create, edit, debug, and optimize your workflows.
Semantic Chunker: Lightweight Python package for semantically-aware chunking and clustering of text. It analyzes the meaning of each chunk, using sentence embeddings and clustering, to merge semantically similar chunks into more coherent units.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes
o3 spends way too much time browsing the web ~
The vibe coder's paradox: what you can vibe code already exists; what doesn't exist, you can't vibe code. ~
Andriy Burkov
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply