• unwind ai
  • Posts
  • Build and Serve AI Agents as an API

Build and Serve AI Agents as an API

PLUS: Open protocol to connect AI agents to front-end apps, Claude's system prompt leak

Today’s top AI Highlights:

  1. Minimal opensource repo for serving agents using FastAPI and Postgres

  2. AG-UI open protocol for AI agents to connect to front-end apps

  3. AI that watches real users using your product

  4. Claude's system prompt of over 24k tokens with tools leaked

  5. First 32B reasoning model trained via globally distributed compute

& so much more!

Read time: 3 mins

AI Tutorial

While working with web data, we keep facing the challenge of extracting structured information from dynamic, modern websites. Traditional scraping methods often break when coming across JavaScript-heavy interfaces, login requirements, and interactive elements - leading to brittle solutions that require constant maintenance.

In this tutorial, we're building an AI Startup Insight Agent application that uses Firecrawl's FIRE-1 agent for robust web extraction. FIRE-1 is an AI agent that can autonomously perform browser actions - clicking buttons, filling forms, navigating pagination, and interacting with dynamic content - while understanding the semantic context of what it's extracting.

We'll combine this with OpenAI's GPT-4o to create a complete pipeline from data extraction to analysis in a clean Streamlit interface. We’ll use Agno framework to build our AI startup insight agent.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Agno has released the Agent API, a production-ready backend for serving AI agents using FastAPI and Postgres. It’s a minimal open-source setup designed for fast local development and clean cloud deployments. The API ships with prebuilt agents, memory support, and full compatibility with the Agno Playground and Agent UI.

You can plug in any model provider, customize agent logic, and deploy quickly using Docker.

Key highlights:

  1. Plug-and-play backend for agents - Ships with a FastAPI server and Postgres DB for handling requests, storing sessions, and managing long-term memory. Everything is containerized for easy startup and teardown with Docker.

  2. Production-ready agent templates - Comes with prebuilt agents like a Web Search agent, a Finance agent using yFinance, and Agno Assist (which can answer questions about Agno itself). You can extend or swap these with your own tools and logic.

  3. Model-agnostic - GPT-4.1 is the default, but you can point to any LLM provider (Claude, Gemini, local models, etc.) by editing config files in the /agents folder. No vendor lock-in, and easy to switch models.

  4. Cloud deployment - The setup supports container platforms like Cloud Run, App Runner, GKE, or ECS. It includes scripts for building and pushing Docker images, environment variable management, and scaling-ready configurations.

Open Agent-User Interaction Protocol 🤖🖥️👩‍💻

CopilotKit has released AG-UI, an open, lightweight, event-based protocol that standardizes how AI agents connect to front-end applications. Think of it as a universal translator for AI-driven systems- no matter what language an agent speaks: AG-UI ensures fluent communication. It’s built to support real-time agent-user collaboration, live state streaming, and frontend tool use, without forcing you to change your agent backend.

AG-UI already supports top agent frameworks like LangGraph, CrewAI, Mastra, and AG2. You might wonder if it competes at all with MCP and Google A2A Protocol; it doesn’t. Here’s how the three sit in an AI agent workflow: a given AI agent may use MCP to call tools (and get context), A2A to communicate with other agents, and AG-UI to collaborate with a user through a frontend application.

Key Highlights:

  1. Standardized event types - AG-UI defines over 16 event types, including lifecycle events, text messages, tool calls, and state sync updates, making it easy to manage structured and streaming interactions.

  2. Real-time collaboration and state sharing - Supports both human-in-the-loop and human-on-the-loop patterns, with bidirectional sync between agent and app state during execution—ideal for building interactive dashboards, copilots, and live workflows.

  3. Out-of-the-box support for major frameworks - LangGraph, CrewAI, Mastra, and AG2 are fully integrated. Support for Agno, OpenAI Agent SDK, Vercel AI SDK, and others is in progress or open to contribution.

  4. Flexible transport - You can use SSE, WebSockets, or webhooks depending on your infra, and AG-UI includes a standard HTTP agent with both text and binary streaming, designed for both developer convenience and production performance.

Quick Bites

Qwen has released quantized versions of its Qwen3 models, now ready to run locally using tools like Ollama, LM Studio, SGLang, and vLLM. Formats include GGUF, AWQ, and GPTQ, making local deployment flexible across setups. Models are available on Hugging Face and ModelScope for direct access.

Understanding how users actually interact with your product is still a pain—analytics lack context, and watching session replays is a time sink. Human Behavior fixes this by using an AI agent that watches real user sessions and explains why users convert, drop off, or explore certain features.

Instead of tagging events or scanning hours of footage, the agent identifies patterns, labels behaviors, and lets product teams search insights with plain language. It works with existing tools like PostHog and Hotjar, and is already used by fast-moving teams like Delve and Conduit.

Claude’s full system prompt, leaked on GitHub, spans over 24,000 tokens — far longer than expected. It defines model behavior, tool usage, and citation style, with repeated instructions that push the model to reason clearly and avoid hallucinations. The prompt emphasizes logic through redundancy and tightly controlled guidance.

Prime Intellect has released INTELLECT-2, a 32B reasoning language model built on top of QwQ-32B and trained using fully asynchronous reinforcement learning. What makes it stand out is the way it’s trained: across a heterogeneous, permissionless swarm of global compute contributors, without relying on centralized GPU clusters. It shows performance gains on math and coding tasks, backed by a custom RL setup designed for decentralization. The model, along with code and data, is now opensource.

Tools of the Trade

  1. Qomplement: OS-native AI agent that actually uses your computer like a human. It can open apps, complete tasks, and learn your workflows across desktop software, not just the browser. It runs locally, interacts with your real apps, and automates multi-step workflows you’d normally do by hand.

  2. mtaai-core: AI agent for social media sellers that automates ad posting, inventory tracking, customer feedback, and calendar-based scheduling. It integrates directly with Instagram and Google Calendar and optimizes posts, manages orders, and handles sales insights.

  3. Cloi: A fully local debugging agent that runs right inside your terminal; no cloud, no API keys, and no data ever leaves your machine. It spins up Ollama locally, analyzes your code errors using models like Phi-4, and suggests safe patches with your approval.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. if academia was so important, how come llm’s came from big corp? ~
    signüll


  2. “It reasoned people, it reasoned!”

    Meanwhile, OpenAI has more than 300 open positions they are actively hiring for. Most of these are for technical roles.

    But of course, they have an internal model that “reasoned”. ~
    Santiago

  3. IMO the explosion of Cursor, Lovable, Windsurf and Bolt isn’t because code is the best LLM application. It’s because SWEs intimately understand the problems of SWEs. It’s downstream of Anthropic dogfooding Claude to 90% of PRs. Any other industry (law, finance, medical care) could have been domed as decisively given comprehensive builder domain expertise. ~
    Jeremy Nixon

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.