Agentic Context Engineering

+ Karpathy’s Nanochat - The best ChatGPT $100 can buy, Vibe build n8n workflows

In partnership with

Today’s top AI Highlights:

& so much more!

Read time: 3 mins

AI Tutorial

Imagine uploading a photo of your outdated kitchen and instantly getting a photorealistic rendering of what it could look like after renovation, complete with budget breakdowns, timelines, and contractor recommendations. That's exactly what we're building today.

In this tutorial, you'll create a sophisticated multi-agent home renovation planner using Google's Agent Development Kit (ADK) and Gemini 2.5 Flash Image (aka Nano Banana).

It analyzes photos of your current space, understands your style preferences from inspiration images, and generates stunning visualizations of your renovated room while keeping your budget in mind.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

There has been a lot of conversation around prompt engineering vs context engineering, where it is proper context that makes the difference between an agent that fumbles through APIs and one that executes flawlessly.

But here's where things break down: current methods like GEPA for building and updating this context fall into two traps. Either they compress everything into short, generic instructions that lack the specific details agents need, or they suffer from "context collapse," where the system rewrites its entire knowledge base each time and accidentally erases hard-won insights in the process.

Stanford researchers just dropped Agentic Context Engineering (ACE) to fix both problems. Instead of compressing or constantly rewriting context, ACE grows it like a living playbook that generates, reflects on, and updates itself rather than retraining.

And the results are surprising - the kind of jump people usually expect from fine-tuning on task-specific data!

Here’s how this works:

  1. 3 specialized agents - Generator agent executes tasks and flags which context pieces helped or hindered performance, Reflector agent analyzes those execution traces to extract specific, actionable lessons, and Curator adds these lessons to the playbook via bullet points without touching existing context.

  2. Delta updates - Curator performs lightweight ops like appending new bullets and running semantic deduplication to avoid information loss that happens when LLMs "summarize" or fully rewrite contexts.

  3. Reflector’s iterative refinement - Reflector can go through multiple rounds of analysis on the same problem, progressively sharpening its diagnosis before the Curator ever touches the playbook.

  4. Is fine-tuning dead? ACE matched a GPT-4.1-based production agent using a smaller open-source model just by engineering better context. And it costs way less than fine-tuning. But, fine-tuning is still better for baking in specialized knowledge that would overflow context windows, or when consistent behavior needs to be burned into the weights.

We've all been underestimating how much you can accomplish with smart context engineering before reaching for the expensive fine-tuning hammer.

The Simplest Way to Create and Launch AI Agents and Apps

You know that AI can help you automate your work, but you just don't know how to get started.

With Lindy, you can build AI agents and apps in minutes simply by describing what you want in plain English.

→ "Create a booking platform for my business."
→ "Automate my sales outreach."
→ "Create a weekly summary about each employee's performance and send it as an email."

From inbound lead qualification to AI-powered customer support and full-blown apps, Lindy has hundreds of agents that are ready to work for you 24/7/365.

Stop doing repetitive tasks manually. Let Lindy automate workflows, save time, and grow your business

Andrej Karpathy just dropped Nanochat, a complete implementation of a ChatGPT-like system that goes from raw data to a working web interface in a single codebase. For just $100 and 4 hours on an 8XH100 node, you can train your own conversational AI that writes stories, answers questions, and interacts through a clean web UI.

The entire pipeline lives in ~8,000 lines of code, covering everything from tokenizer training with a custom Rust implementation to reinforcement learning with GRPO. It handles pretraining on FineWeb, midtraining on conversational data, supervised fine-tuning, and includes an efficient inference engine with KV caching and tool use.

Scale up to ~$1000, and you get a model that solves math problems, writes code, and performs reasonably on multiple choice tests, hitting the 40s on MMLU and 70s on ARC-Easy.

Key Highlights:

  1. Single-script execution - Run the entire pipeline from tokenization through deployment with speedrun.sh, which automatically handles data downloads, training stages, evaluation, and checkpoint management across the full stack.

  2. Custom inference engine - Implements efficient generation with KV cache prefill/decode patterns, supports parallel sampling, and includes tool use via sandboxed Python execution for calculator functionality during conversations.

  3. Report card - Each run generates a markdown report with a summary table showing your model's progression (BASE → MID → SFT → RL) across all evaluation metrics, complete with wall clock time and cost estimates, making training feel like watching your model level up.

  4. Cost-performance scaling - At $100, you get a toy model, $300 surpasses GPT-2 CORE scores, and $1000 trains a GPT-3 Small equivalent (1/1000th the compute of full GPT-3) that handles math, code, and knowledge tasks.

Quick Bites

Microsoft's free course on production-ready on-device AI
Microsoft just dropped a comprehensive course that teaches developers how to deploy small AI models like Phi-4 and Mistral-7B directly on phones, industrial hardware, and embedded systems with no cloud dependencies. The 36-45 hour curriculum covers everything from quantization techniques (achieving 75% size reduction while retaining 85% performance) to local RAG pipelines and multi-agent systems. Particularly focused on real-world production constraints, including privacy compliance, millisecond latency requirements, and systems that actually work offline.

Vibe build n8n workflows with their AI Workflow Builder
n8n just launched AI Workflow Builder in beta, which converts simple prompts into functional automations complete with nodes, logic, and connections. You can iterate through conversations to build and adjust workflows step by step, which is particularly useful for exploring unfamiliar nodes or validating approaches quickly. Rolling out this week to Cloud users on Trial, Starter, and Pro plans, though usage will be metered to cover the actual model costs.

Crawl websites using natural language with Firecrawl v2
Firecrawl v2 lets you crawl websites by simply telling it what you want. Just specify "get the blog pages" or whatever subset you need, and it semantically determines which links to follow. The bot translates your intent into the appropriate crawl parameters automatically. No more manually configuring crawl parameters or writing complex selectors.

Make your Gemini CLI an expert in Google’s open-source AI framework
Google’s open-source AI development framework GenKit now has an Extension for the Gemini CLI. The extension bundles Genkit's MCP server with context files to give Gemini CLI deep knowledge of the framework's architecture, letting you create flows, debug traces, and access documentation without leaving your command line. Install it with one command and you're working with an AI agent that understands Genkit's patterns and APIs, not just generic code suggestions.

GitHub Copilot’s new embedding model to quickly find the right code
GitHub just shipped a new embedding model for Copilot that makes code search in VS Code 37.6% more accurate at retrieving the right snippets, with 2x faster throughput and an 8x smaller memory footprint. The secret here is training on "hard negatives" -code examples that look correct but aren't, which teaches the model to distinguish between "almost right" and "actually right." The update is already live in VS Code for all Copilot users.

Some QoL upgrades from Google:

  • Video Overviews in NotebookLM is getting a Nano Banana upgrade. You can now turn your documents into fun narrative videos with beautiful illustrations, all powered by Google’s latest image gen model. Video Overviews now gives you six visual themes to choose from to make your storytelling or learning experience more effective. This feature is also being rolled out to Gemini Pro users.

  • Google AI Studio now has a proper rate limit dashboard that shows real-time usage across RPM, TPM, and RPD, all without bouncing over to Cloud Console. You can filter by model, check your actual limits at a glance, and finally stop guessing whether you're about to hit a wall mid-project.

Tools of the Trade

  1. Open Rube - open-source implementation of Composio's Rube platform that uses their Tool Router to connect AI agents to 500+ applications like GitHub, Slack, and Gmail within the chat interface. It handles authentication via Supabase, manages conversation history in PostgreSQL, and streams live responses.

  2. QueryDeck - Generates instant REST APIs, AI agent tools, and MCPs for Postgres. Use the visual no-code builder to create complex SQL queries with deep joins, nested inserts, dynamic parameters, and turn them into fully functional REST APIs. Deploy instantly or push them as a Node.js app to your repository.

  3. PageIndex MCP - PageIndex is a vectorless, reasoning-based RAG system that represents docs as hierarchical tree structures instead of vectors. This server exposes this AI-native tree index directly to LLMs, allowing agents to reason over document structure and retrieve the right information.

  4. Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
    (Now accepting GitHub sponsorships)

Hot Takes

  1. Search OPENAI_API_KEY on GitHub and thank Vibe Coders.
    ~ Ashutosh Shrivastava

  2. all the former effective altruists are now working on AI safety because it’s the easiest way to gain power (control others) without doing any real work

    ~ Ali Shobeiri

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.