- unwind ai
- Posts
- Windsurf SWE-1 for Agentic Software Development
Windsurf SWE-1 for Agentic Software Development
PLUS: AI agent debugger to fix other AI agents, MCP support in Proxy agent by Convergence AI
Today’s top AI Highlights:
This AI agent can catch and fix 20+ types of agent failures automatically
Windsurf releases Software Engineering models that go beyond coding
Major feature updates in Proxy, Manus, and Genspark AI agents
Data vibe with this AI code editor to write code on data
& so much more!
Read time: 3 mins
AI Tutorial
Building good research tools is hard. When you're trying to create something that can actually find useful information and deliver it in a meaningful way, you're usually stuck cobbling together different search APIs and prompt engineering for hours. It's a headache, and the results are often inconsistent.
In this tutorial, we'll build an AI Domain Deep Research Agent that does all the heavy lifting for you. This app uses three specialized agents that are built using the Agno framework, use Qwen’s new flagship model Qwen 3 235B via Together AI, and use tools via Composio to generate targeted questions, search across multiple platforms, and compile professional reports — all with a clean Streamlit interface.
What makes this deep research app different from other tools out there is its unique approach: it automatically breaks down topics into specific yes/no research questions, combines results from both Tavily and Perplexity AI for better coverage, and formats everything into a McKinsey-style report that's automatically saved to Google Docs.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Patronus AI has released Percival, the first AI agent evaluation agent that can detect and fix 20+ failure types in agentic workflows. It’s built specifically to analyze full agent traces, including tool calls, planning decisions, and context use, and suggest prompt or config improvements. Unlike static LLM-as-a-Judge evaluations, Percival works on long and complex traces by observing the whole execution and learning from feedback over time.
Percival uses built-in memory (episodic and semantic) to cluster errors, suggest fixes, and score traces based on reliability and security. It integrates smoothly with frameworks like SmolAgents, Pydantic AI, CrewAI, LangGraph, and OpenAI Agents SDK.
Key Highlights:
Comprehensive error detection - Percival analyzes complete agent workflows to identify 20+ specific failure modes, including hallucinations, information processing issues, resource exhaustion, and orchestration problems. It can process millions of tokens in agent traces to find issues that traditional evaluation methods often miss.
Self-improving capabilities - Through innovative episodic memory techniques developed with Weaviate, Percival learns from past experiences and user feedback. Each confirmed issue or annotation helps the system build domain-specific knowledge and improve future evaluations.
Framework Integration - You can integrate Percival with popular agentic frameworks, including OpenAI Agent SDK, LangGraph, CrewAI, Pydantic AI, and smolagents through simple API connections or decorators. It also supports custom OpenAI and Anthropic clients through compatible instrumentors.
Actionable Fix Suggestions - Beyond simply identifying problems, Percival clusters similar errors and recommends specific prompt improvements that can be appended to existing prompts, saving engineering teams hundreds of hours in trace analysis and prompt engineering.
Free AI Resource
Unlock powerful workflows from two CEOs who are reinventing the future of work with AI.
In this free resource, you'll discover how forward-thinking leaders are using AI to streamline meetings, cut busywork, and automate decisions.
Get actionable insights that you can implement in your own remote teams today.
Windsurf has introduced the SWE-1 family set of models built specifically for software engineering tasks, not just coding. The lineup includes SWE-1, SWE-1-lite, and SWE-1-mini, each designed for different interaction speeds and surfaces across the dev workflow. SWE-1 performs close to Claude 3.5 Sonnet but costs less to serve and is currently available to all paid users at 0 credits per prompt, while SWE-1-lite is for unlimited use, and SWE-1-mini for fast passive experiences.
What sets SWE-1 apart is how it's trained to reason through incomplete states and long-running tasks across multiple tools, like terminal commands, editor states, and even preview feedback. These models are trained on a unique "flow-aware" methodology that helps them understand incomplete states and work across multiple surfaces, essential for real-world engineering where projects evolve through many iterations.
Key Highlights:
Full-spectrum engineering focus - Unlike typical coding models, SWE-1 is designed to handle the entire software development process — writing code, working in terminals, accessing external knowledge, testing products, and understanding user feedback. This makes it practical for actual development workflows, not just isolated coding tasks.
Flow-aware architecture - SWE-1 builds on Windsurf's "shared timeline" concept, allowing seamless switching between human and AI contributions. This means the model can observe your edits, terminal outputs, and browser interactions, then continue working based on those changes, creating a natural collaboration that adapts to your workflow.
Strong performance - In both controlled evaluations and real-world usage, SWE-1 models outperform non-frontier alternatives like Qwen 2.5 72B and DeepSeek V3, and achieve results close to leading foundation models like Claude 3.7 Sonnet. Production testing shows high user adoption rates by developers.
Deployment options - The three-tier approach offers flexibility for different needs: the premium SWE-1 (currently free for paid users), the unlimited SWE-1-lite (available to all users), and the ultrafast SWE-1-mini (powering passive experiences in Windsurf Tab).
Quick Bites
Convergence AI has added native MCP support to its headless browser agent, Proxy. Starting with Linear, Asana, Sentry, and Intercom, this lets the AI agent directly interact with tools without needing extra wrappers or APIs. You can now create issues in Linear, file bugs to Sentry, assign tasks in Asana, or message users on Intercom — all from within Proxy using simple prompts. More integrations are on the way.
Genspark Super AI Agent can now search the web, download, and organize any file with just one prompt using its new Full Agentic Download Agent and AI Drive feature. Whether it’s PDFs, videos, images, music, or Office documents, it fetches everything from the web, creates folders, and stores them autonomously. Once saved, you can run AI tools on your files to summarize papers, generate reports, or even convert images into videos—all from one place.
Manus AI has rolled out a new image generation feature that goes far beyond just generating pictures. When you give it a task, Manus AI understands your goal, picks the right visual tools, combines them with its own reasoning, web search, layout design, and planning to complete the full task. It effectively uses image generation as one part of a larger task pipeline, whether it’s designing a product label, styling a furniture setup, or building a full website from raw photos.
Google Cloud has released the Agent Starter Pack, a collection of ready-to-deploy GenAI agent templates to run on Cloud Run or Agent Engine. It includes template agents using Google ADK, CrewAI, Google’s Live API, Agentic RAG, and more, complete with frontend, backend, CI/CD, observability, and evaluation tooling using Vertex AI. You can spin up a new agent project with just one command and customize it as needed, with built-in support for LangGraph, Gemini, and real-time multimodal APIs.
Tools of the Trade
Townie: A 100% opensource AI coding agent built on Val Town, inspired by tools like Lovable, bolt, and v0, but built into itself. It lets you code, prompt, branch, and manage pull requests for full-stack projects directly in the browser, no local setup needed.
Jupyt: AI agent for Jupyter notebooks that can create, edit, and run code cells on its own. It integrates with major model APIs, supports dataset search from platforms like Hugging Face and Kaggle, and brings modern IDE-like features without changing the Jupyter workflow.
Nelly: An end-to-end desktop app to build, use, and monetize AI agents. You use build no-code AI agents with tools and databases, test and refine them, deploy and use, and soon monetize with their upcoming agent marketplace.
nao: AI code editor to write code on data: a local editor, connected to your data warehouse, with an AI copilot that has context of both your data schema and codebase. It’s built specifically for data workflows, helping teams write faster, better code, while being ensured of data quality.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
It seems to me that AGI would mean the end of prompt engineering. Moderately intelligent humans can figure out what you want without elaborate prompts. So by definition so would AGI. ~
Google just clearly has the best AI offering rn. Gemini is fantastic, and I actually trust Google’s infra. OpenAI is still a startup & Anthropic is nowhere to be seen. Why youd go with any other closed source model is beyond me. Can’t believe we were all clowning on Google b4 ~
Avi Schiffmann
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply