unwind ai
Posts
Anthropic on Building Agent tools with AI Agents

Anthropic on Building Agent tools with AI Agents

+ Alibaba Qwen's new MoE model beats Gemini 2.5 Flash Thinking, Anti-vibe coding agent

Shubham Saboo & Gargi Gupta
September 15, 2025

In partnership with

Today’s top AI Highlights:

& so much more!

Read time: 3 mins

AI Tutorial

Learn OpenAI Agents SDK from zero to production-ready!

We have created a comprehensive crash course that takes you through 11 hands-on tutorials covering everything from basic agent creation to advanced multi-agent workflows using OpenAI Agents SDK.

What you'll learn and build:

Starter agents with structured outputs using Pydantic
Tool-integrated agents with custom functions and built-in capabilities
Multi-agent systems with handoffs and delegation
Production-ready agents with tracing, guardrails, and sessions
Voice agents with real-time conversation capabilities

Each tutorial includes working code, interactive web interfaces, and real-world examples.

The course covers the complete agent development lifecycle: orchestration, tool integration, memory management, and deployment strategies.

Everything is 100% open-source.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

OpenAI Agents SDK Crash Course

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini, and open-source models. - Shubhamsaboo/awesome-llm-apps

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

What Anthropic Learned from Building Hundreds of AI Agent Tools 🧰💡

Building tools for AI agents isn't just about wrapping APIs. It's about fundamentally rethinking how software should behave when users are non-deterministic. Anthropic engineers just shared their blueprint for building effective tools specifically for AI agents.

Their engineering team built prototypes, ran comprehensive tests with real tasks, and found the following patterns that consistently improve agent performance.

The most important insight? AI agents like Claude write better tool implementations than humans!

Agent-Centric Architecture - Design tools around how agents naturally subdivide tasks, prioritizing contextual relevance over technical flexibility to reduce cognitive load and error rates.
Tool Consolidation - Instead of wrapping individual API endpoints, create composite tools that handle multi-step workflows (like schedule_event instead of separate list_users, list_events, and create_event tools).
Response Management - Implement response format controls (concise vs. detailed) and token efficiency optimizations like pagination and truncation to manage agent context consumption effectively.
Clear Namespacing - Group related tools under clear prefixes to help agents navigate tool selection when dealing with hundreds of available functions across multiple services.
Tool Descriptions Matter - Treat tool descriptions like system prompts, making implicit knowledge explicit and describing tools as you would to a new team member.

Here’s a thread with visualizations and code examples to help you better understand these principles.

The original article has way more depth on each of these. Do read it if you're building anything with agents.

Meta-Agent Framework that Spins Up AI Agents On-the-Fly 🤖🌲

Workflows with 50, 80, even 100+ steps usually fall apart when agents lose track of context. This new framework shows that a recursive team of agents can not only survive that scale, but actually make it reliable.

ROMA is an open-source meta-agent framework where parent nodes break a complex goal into subtasks, pass them down to their child nodes as context, and later aggregate their solutions in the context as results flow back up.

With structured Pydantic inputs and outputs, the flow of context is transparent and fully traceable. Builders can see exactly how reasoning unfolds, enabling easy debugging, prompt refinement, and agent swapping. Further, the framework is extremely modular - you can plug in any agent, tool, or model at the node level.

Key Highlights:

Recursive Task Architecture - A complex task is broken into multiple subtasks, each represented as a node, which can either execute directly, break itself down into subtasks, or aggregate the results of its children. This tree-like structure makes the flow of context explicit, traceable, and easy to refine.
Traceability - Structured Pydantic inputs and outputs provide complete traceability of reasoning steps. Stage tracing shows exactly how information flows between nodes for easy debugging and prompt refinement.
Plug-and-Play - Any agent, tool, or model works at the node level without framework modifications. Human verification can be inserted at any point for critical decision checkpoints.
Performance - To demonstrate its efficacy, the team built an internet search agent based on ROMA. On the SEALQA benchmark, which tests complex, multi-source reasoning, this search agent beats Kimi Researcher and Gemini 2.5 Pro by huge margins.

Qwen’s 3B Model Delivers Flagship-level performance 💪 🎯

What if you could get flagship model performance while using < 4% of the compute?

That's exactly what Alibaba's Qwen team has pulled off with their latest architecture. Meet Qwen3-Next-80B-A3B, an 80-billion parameter model that activates only 3 billion parameters per token, delivering performance that rivals its massive 235B flagship.

Training costs drop by 90% compared to comparable dense models while inference speeds jump 10x, especially in long-context scenarios where traditional models struggle.

The models come in two variants - Thinking and Instruct - where the Thinking variant even outperforms Google's Gemini 2.5 Flash Thinking on multiple benchmarks. Both model weights are available to download on Hugging Face and Modelscope under the Apache 2.0 license.

Key Highlights:

Extreme Efficiency - Uses only 3.7% of parameters during inference (3B out of 80B), achieving 10x faster training and inference compared to Qwen3-32B while matching or beating its performance.
Hybrid Architecture - Combines Gated DeltaNet (75% of layers) with standard attention (25% of layers) to get the best of both worlds: speed from linear attention and recall strength from standard attention.
Ultra-Sparse MoE - Features 512 total experts with only 10 routed plus 1 shared expert per token, maximizing resource utilization without performance degradation.
Long-Context Domination - Natively supports up to 256K tokens with 10x+ throughput advantage over dense models at 32K+ context lengths, extensible to 1M tokens with RoPE scaling.

The best marketing ideas come from marketers who live it.

That’s what this newsletter delivers.

The Marketing Millennials is a look inside what’s working right now for other marketers. No theory. No fluff. Just real insights and ideas you can actually use—from marketers who’ve been there, done that, and are sharing the playbook.

Every newsletter is written by Daniel Murray, a marketer obsessed with what goes into great marketing. Expect fresh takes, hot topics, and the kind of stuff you’ll want to steal for your next campaign.

Because marketing shouldn’t feel like guesswork. And you shouldn’t have to dig for the good stuff.

Quick Bites

Top model scores may be skewed by Git history leaks in SWE-bench
A significant data leakage issue has been uncovered in SWE-bench Verified, where AI coding agents have been cheating by accessing future Git commits that contain the actual solutions to problems they're being tested on. Models, including Claude 4 Sonnet and Qwen variants like Qwen3 Coder, were caught using Git commands to find commit messages with titles like "Fix incorrect result of getmodpath method" - essentially getting the answers before taking the test. The benchmark maintainers are now scrambling to remove all traces of future repository state, including branches, reflogs, and remote origins that could leak solution hints.

MCP server to search the source code of package dependencies
Chroma launched Package Search, an MCP server that gives AI agents direct access to the source code of package dependencies through semantic and regex search tools. Instead of relying on web searches that can be slow and inaccurate, agents can now query indexed open-source repositories for ground truth context, which reduces hallucinations and improves performance. Use it with Cursor, Claude Code, or any MCP client with a single line of configuration.

Claude now has memory
Rolling out for Team and Enterprise users, Claude now remembers your work context across conversations. Each project maintains its own separate memory while giving you full control to view, edit, and guide what gets stored. Incognito chats let all users start fresh conversations that bypass both memory and chat history entirely.

OpenAI significantly cranked up GPT-5 rate limits
OpenAI has significantly boosted API rate limits for GPT-5 and GPT-5 Mini across multiple tiers, with the most dramatic increases hitting Tier 1 users who now get 500K tokens per minute for GPT-5 (up from 30K) and matching limits for GPT-5 Mini (up from 200K). Higher tiers see doubled capacity, with Tier 4 GPT-5 users now enjoying 4M TPM.

App deployment and security analysis with new Gemini CLI extensions
Gemini CLI now handles two critical developer workflows with simple slash commands: /deploy for instant Cloud Run deployments and /security:analyze for comprehensive vulnerability scanning. Both extensions work locally and integrate with your existing git workflow, automatically analyzing diffs and providing detailed reports with fix suggestions.

Tools of the Trade

Runner - IDEs were never designed for agent-driven coding, especially the “plan and review” workflow that developers now need. Runner is a task-based dev environment where you design specs, AI agents write the code, and you review results through an integrated diff tool. Every change requires explicit approval before commit, keeping you accountable for what ships.
Latitude - Build autonomous AI agents by describing what you want in natural language, and it wires up models and integrations automatically. It integrates with 10,000+ tools via MCP, so agents can work with services like Slack, Notion, databases, or your own custom servers. You can self-host using Docker, Kubernetes, or PaaS, or use their managed cloud.
Perplexica - Open-source alternative to Perplexity that understands your questions, searches deep into the internet, and gives you answers with proper citations. It offers different focus modes (academic, writing, YouTube, Reddit, Wolfram Alpha, etc.), a Copilot Mode for query expansion, and support for local models.
Awesome LLM Apps: A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)

Hot Takes

Hilarious! Oracle went from a dying company to the hot new AI powerhouse in ONE DAY based on a piece of paper!
OpenAI signed a piece of paper stating that it will purchase compute from Oracle....
Compute which Oracle doesn't have but plans to build by taking on a significant amount of debt. ~
Bindu Reddy
i don't want a thinner phone. i want a phone that comes with a full 700W H100 built in ~
vik

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.