unwind ai
Posts
OS for Production AI Agents

OS for Production AI Agents

+ Google's async coding agent Jules in CLI, Build GraphRAG agents in minutes

Shubham Saboo & Gargi Gupta
October 06, 2025

Today’s top AI Highlights:

& so much more!

Read time: 3 mins

AI Tutorial

Build an AI Home Renovation Planner Agent using Nano Banana 🍌🍌

Imagine uploading a photo of your outdated kitchen and instantly getting a photorealistic rendering of what it could look like after renovation, complete with budget breakdowns, timelines, and contractor recommendations. That's exactly what we're building today.

In this tutorial, you'll create a sophisticated multi-agent home renovation planner using Google's Agent Development Kit (ADK) and Gemini 2.5 Flash Image (aka Nano Banana).

It analyzes photos of your current space, understands your style preferences from inspiration images, and generates stunning visualizations of your renovated room while keeping your budget in mind.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI Home Renovation Planner Agent using Nano Banana

Fully local multi-agent app using Google ADK and Nano Banana (100% open-source)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

MLOps World’s Real Talk on AI Agents & GenAI

This year’s MLOps World | GenAI Summit is for experienced AI builders who already know the basics. It’s built around challenges you’re facing now: agent lockdown, drift, memory, cost, and tool integrations. Across 75+ sessions and 16 tracks, you’ll hear from engineers and researchers pushing systems live under constraints. The speaker list reads like a who’s who of AI builders (this list alone will make you excited).

Tracks include Agents in Production, LLM Infrastructure & Operations, ML Lifecycle Security, Governance & Audits, Lean MLOps for Small Teams.
Featured speakers: Calvin Smith (OpenHands), Niels Bantilan (Union.ai), Yegor Denisov-Blanch (Stanford), Federico Bianchi (TogetherAI), and so many more.
Workshops like “Building AI Agents from Scratch,” “Managing RAG,” and “Conversational Agents with Thread Metrics”.
Real case studies, not just slides: see what went wrong, what they fixed, and what still haunts them.

Lock in your pass for MLOps World 2025 - spots are limited! Join virtually Oct 6-7, then come live Oct 8-9 in Austin (or stream everything).

Give a Smart Muscle Memory to Your AI Agents 🧠💪

Your AI agent keeps calling the LLM to do the exact same task it did yesterday. And the day before.

You're burning tokens on repetitive automations that should just work like a script after the first run.

Butter, an OpenAI-compatible proxy, solves exactly this. It introduces a concept called "muscle memory" for LLMs - an API proxy that records how agents solve problems and replays those solutions deterministically on similar tasks. It sits between your application and providers like OpenAI, Anthropic, or any compatible API, caching responses and intelligently reusing them.

What makes this different from basic caching is that Butter understands prompt structure, not just exact matches. It can identify when two requests are functionally identical, even if the specific details differ, and serve the cached response adapted to the new context. The result is agents that cost less and run faster on repetitive tasks while maintaining the flexibility to handle exceptions.

Key Highlights:

Smart cache matching - Butter analyzes the structure of your prompts to identify variables and patterns, allowing it to serve cached responses even when the specific details change (like swapping "John" for "Jane" in a form-filling task).
Zero code changes - Drop-in replacement for OpenAI's API - just point your existing code at Butter's endpoint and it handles the rest, forwarding requests to your chosen provider while building up its cache.
Deterministic execution - Once a task pattern is cached, Butter returns responses instantly without calling the LLM, giving you consistent results and eliminating the token costs for repetitive operations.
Open development - Currently free and open-access while the team gathers feedback on template induction accuracy and discovers edge cases in real-world usage.

CrewAI AMP: The OS for Production AI Agents 🔄 🏣

Production AI agents need more than a framework. They need infrastructure, governance, and the ability to scale across departments.

CrewAI AMP delivers exactly that: a complete platform for building, deploying, and managing agent systems with the same rigor you'd apply to any critical production service.

The numbers tell the story. A Fortune 500 company deployed 41 internal builders who created 30+ use cases, spinning up nearly 500,000 agents that completed over 100,000 executions in roughly two weeks. Another public company hit 21 production use cases with 50,000+ executions and counting.

The platform combines visual no-code workflows in CrewAI Studio with full programmatic control through APIs, letting you build agent crews in under 60 seconds or export everything as Python code. It handles everything from agent training and LLM testing to comprehensive tracing, monitoring dashboards, and reusable repositories. You maintain full control with RBAC, audit logs, and the ability to run anywhere while avoiding vendor lock-in through code export.

Key Highlights:

Studio to production in minutes - Vibe code or drag-and-drop workflows and agent crews, complete with execution debugging, local testing, and the option to export React components or publish as MCP servers.
Native integrations and custom tools - Connect to Gmail, Google Drive, Slack, HubSpot, and dozens more through OAuth, or build custom tools with the API and store them in private repositories for organization-wide reuse.
Memory and learning systems - Short-term, long-term, entity, and contextual memory types enable agents to adapt over time, remember preferences, and make informed decisions based on historical context and learned patterns.
End-to-end visibility and control - Execution timelines, detailed task views, agent thought processes, and final outputs all captured in traces, with webhook streaming for real-time event monitoring and comprehensive admin dashboards for operational insights.
Enterprise-ready from day one - Role-based access control, deployment history, hallucination guardrails, webhook streaming, and automated agent training with human-in-the-loop feedback ensure your agents produce reliable, repeatable outcomes at scale.

Quick Bites

Mira Murati’s Thinking Machines debuts with fine-tuning as a service
Mira Murati’s multi-billion-dollar startup, Thinking Machines Lab, debuted its first product, Tinker, and it’s probably deliberately unsexy. Tinker is a training API that gives you low-level control over the fine-tuning process while the platform handles all the infrastructure work. It supports everything from small models up to large Mixture-of-Experts models like Qwen-235B, with model swaps done by changing just one string. It is in private beta now (free to start), and you can join the waitlist.

No/low-code tool to build GraphRAG agents in minutes
Neo4j just launched Aura Agent in early access, a no/low-code platform to build GraphRAG agents in minutes, abstracting away much of the backend work. It combines Cypher templates for precise queries, vector similarity search, and text-to-Cypher generation - all deployable via API endpoints with built-in LLM orchestration and authentication handled for you. It’s not for production yet, still a great way to test the waters for your use case.

Hume ships voice AI that is 2x faster and ½ the price
Hume just released Octave 2, a next-gen multilingual voice AI that now speaks 11 languages. It’s 40% faster (audio in under 200 ms), costs half as much as Octave 1. The interesting bits are voice conversion (swap voices while preserving timing and phonetics) and direct phoneme editing, which lets you craft pronunciations that don't exist in the training data. Access is live now via Hume’s platform and API.

Show what you can build with Claude 4.5 and win $3400 worth prize
Anthropic is running a week-long "Built with Claude Sonnet 4.5" challenge with four $1k API credit prizes plus year-long Claude Max subscriptions. They're looking for technical prowess, research depth, educational tools, and artistic applications. You need to quote-post on X or share in their Discord with what you built, how you built it in a week or less, screenshots, and specifics on using 4.5's features. Winners announced by October 10. Open to 16 countries, US residents included.

Tools of the Trade

Jules Tools - A CLI wrapper around Google’s Jules coding agent that runs tasks like writing tests, fixing bugs, and building features asynchronously in remote VMs. This lets you control Jules via Terminal rather than the browser.
InsForge - Open-source backend platform built for AI agents that provides backend primitives (DB, auth, storage) via MCP. Its APIs are consistent, defaults are safe, and you don’t need deep backend knowledge to get things running.
Simplex - A developer platform for browser automation: it offers remote browsers, web agents, and infrastructure so you don’t build from scratch. It handles login/2FA, anti-bot protections, caching, workflow orchestration, and exposes SDKs and UI for building and managing agent flows.
Context Engineering Template - A comprehensive template for context engineering for building full context artifacts (rules, examples, docs) so AI coding agents perform reliably. It uses Claude Code as the main target, but you can apply the pattern to any AI coding agent.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)

Hot Takes

Work: ChatGPT
Relax: Sora
OpenAI is gonna eat everything.
~ Nick Dobos
Is it time we stop using the word AI for everything and instead use words like "chatbots", "video generation", "recommendation engines", "cell prediction",...?
Feels like as a society, we could have healthier debates like that.
~ Clement Delangue

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.