- unwind ai
- Posts
- Claude Code's Hidden Multi-Agent Orchestration now Open-source
Claude Code's Hidden Multi-Agent Orchestration now Open-source
+ LLMs will manage their own context in 2026
Today’s top AI Highlights:
& so much more!
Read time: 3 mins
AI Tutorial
Google recently launched the Interactions API alongside Gemini Deep Research, an autonomous research agent that can conduct comprehensive multi-step investigations.
This is a significant shift from traditional APIs - instead of stateless request-response cycles, you get server-side state management, background execution for long-running tasks, and seamless handoffs between different models and agents.
In this tutorial, we'll build an AI Research Planner & Executor Agent that demonstrates these capabilities in action. The system uses a three-phase workflow: Gemini 3 Flash creates research plans, Deep Research Agent executes comprehensive web investigations, and Gemini 3 Pro synthesizes findings into executive reports with auto-generated infographics.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
2025 gave us enough evidence that “context rot” in LLMs and AI agents is real with long-running chats. And a lot of work, including Context Folding and Agentic Context Engineering, went into solving this.
But this MIT paper proposed an even better solution, called Recursive Language Models (RLMs), and now this has become a major research focus for companies like Prime Intellect.
Instead of fighting context rot with architecture changes, RLMs sidestep it entirely by never exposing models to huge contexts in the first place.
RLMs store large contexts in a Python environment as variables rather than directly in the model's context window, letting the model peek at, partition, and recursively query subsets of data without ever loading everything at once.
Think of it as giving your LLM a scratchpad where it can break down massive inputs, delegate work to fresh instances of itself, and keep its main context clean.
The results? A smaller GPT-5-mini using RLMs outperformed the full GPT-5 by over 114% on the hardest long-context benchmarks.
Key Highlights:
Self-Managing Context - All the context lives in a Python environment as variables that the RLM can query. It can write code to search, filter, and transform data, then spawn fresh sub-LLM instances to process specific chunks.
Adaptive Problem-Solving - The model autonomously chooses how to decompose tasks through learned strategies like grepping for keywords, or partitioning context into chunks for parallel sub-LLMs. All decisions are made at inference time rather than hardcoding.
Performance - On long multi-step benchmarks, RLM(GPT-5-mini) beat standard GPT-5 by 49% while costing less per query, and showed no performance degradation even with 10M+ tokens.
RL Training Potential - Prime Intellect's implementation and experiments show RLMs improve with environment-specific tips, pointing to massive gains once models are explicitly trained via reinforcement learning to use this scaffolding.
“teaching models to manage their own context end-to-end through reinforcement learning will be the next major breakthrough, enabling agents to solve long-horizon tasks spanning weeks to months.”
Stop Drowning In AI Information Overload
Your inbox is flooded with newsletters. Your feed is chaos. Somewhere in that noise are the insights that could transform your work—but who has time to find them?
The Deep View solves this. We read everything, analyze what matters, and deliver only the intelligence you need. No duplicate stories, no filler content, no wasted time. Just the essential AI developments that impact your industry, explained clearly and concisely.
Replace hours of scattered reading with five focused minutes. While others scramble to keep up, you'll stay ahead of developments that matter. 600,000+ professionals at top companies have already made this switch.
The Claude Code team has been hiding something powerful.
Buried inside the official codebase is a complete multi-agent orchestration system - fully built, extensively tested, just waiting behind a single disabled function.
But this open-source project, CC Mirror, unlocks what might be the cleanest agent coordination framework you'll find: no extra dependencies, no new abstractions, just pure task decomposition with blocking relationships and background execution.
The system turns Claude into "The Conductor" that decomposes complex work into dependency graphs and spawns background agents to execute in parallel, while the lead continues working, processing completion notifications as they arrive.
The project has been designed to work around how Claude natively thinks and works.
Key Highlights:
Zero external dependencies - The entire orchestration runs on task JSON files with Claude Code's native background execution handling all agent spawning and lifecycle.
Task graphs with dependencies - Tasks can block each other through
blockedByandblocksrelationships, creating dependency chains where completing one task automatically unblocks downstream work.Ownership and protection - Each task tracks its owner (agent ID), and only the owner or team-lead type can update that task, preventing race conditions in multi-agent scenarios.
Background-first execution - All agents run in the background by default, letting the orchestrator continue planning and working while agents execute tasks in parallel.
Built-in orchestrator skill - When enabled, CC Mirror installs a comprehensive skill that teaches Claude proven patterns (Fan-Out, Pipeline, Map-Reduce) and a warm "Conductor" identity focused on absorbing complexity and radiating simplicity.
Quick Bites
Claude Code’s creator runs the simplest CC setup
You'd expect the creator of the industry's most capable coding agent to run some elaborate custom rig, but Boris Cherny's setup is almost aggressively vanilla. Cherny shared an entire thread on his Claude Code setup, and there are some great takeaways from it.
His team treats their shared CLAUDE.md as a living knowledge base, constantly refined through code reviews and real-world failures. The most revealing yet obvious choice? He uses Opus 4.5 with thinking for everything, even though it’s slower.
Runs 5 parallel Claude sessions in terminal tabs and uses system notifications to know if Claude needs inputs
Operates 5-10 additional Claude Code sessions on claude.ai/code simultaneously, hand off local sessions to web, and go back and forth.
Team maintains single shared CLAUDE.md checked into git, updated multiple times weekly. Other teams maintain their own CLAUDE.md's. It is each team's job to keep theirs up to date.
He doesn't use --dangerously-skip-permissions. Instead, /permissions help to pre-allow common bash commands that are usually safe to avoid unnecessary permission prompts.
During code review, tagging @.claude on a coworkers' PRs adds something to the CLAUDE.md as part of the PR. Every time knowledge capture happens, making subsequent features easier to build.
There’s a lot more you can learn from his setup. Do check it out!
Vibe Coding a static site on a $25 Walmart Phone
A developer turned a $25 Walmart phone into a full web server running nginx, Cloudflare tunnels, and Prometheus monitoring, all configured with Claude Code. The setup runs a complete Ubuntu environment via Termux and proot, bypassing Android's process restrictions with some clever workarounds. It's remarkably capable for pocket change!
New Gemma 3 model for function calling on-device
Google released FunctionGemma, a 270M parameter model fine-tuned specifically for function calling that runs entirely on-device. The interesting angle here is the customization path: baseline accuracy starts at 58% but jumps to 85% after fine-tuning on specific domains, proving that smaller specialized models can outperform generic prompting for local agents. It's available now with full ecosystem support across Hugging Face, Keras, vLLM, and mobile deployment via LiteRT-LM.
Tools of the Trade
Sub-Agents Directory - Open-source curated collection of 100+ sub-agent prompts and MCP servers for Claude Code. It has ready-to-use prompts for dozens of frameworks and languages, like Next.js, React, Python, and more. Each prompt is designed to give Claude Code the context it needs to write better code, follow best practices, and understand your project's conventions.
Beads - Git-backed memory system for AI coding agents. It replaces messy markdown plans with a dependency-aware graph, allowing agents to handle long-horizon tasks without losing context. Agents query a local SQLite cache that auto-syncs via JSONL commits.
Bridle - A config manager for agentic harnesses. It lets you create, switch, and sync configuration profiles across AI coding agents like Claude Code, OpenCode, Goose, and Amp. It also functions as a cross-harness package manager, automatically translating and installing skills, agents, commands, and MCPs to work with each tool's different structures and config schemas.
Claude Code for Product Managers - An interactive course for product managers to use Claude Code by having them actually use Claude Code to complete real PM tasks like writing PRDs, analyzing data, and processing user research.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
Hot Takes
I'm not joking and this isn't funny. We have been trying to build distributed agent orchestrators at Google since last year. There are various options, not everyone is aligned... I gave Claude Code a description of the problem, it generated what we built last year in an hour.
Random thought. We are going to be so much faster at creating and building. Companies which need a lot of coordination or meetings will fall behind.
You need to become good a making decision and taking ownership/ responsibility.
Goal for 2026 reduce meetings, communicate asynchronously and ship 100x more.
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉






Reply