• unwind ai
  • Posts
  • Software Development Agents with Memory, RAG, and MCP

Software Development Agents with Memory, RAG, and MCP

PLUS: Perplexity competes with Manus AI and Genspark, DeepSeek R1 on a single GPU

Today’s top AI Highlights:

  1. AI agents purpose-built for software development and deep org context

  2. Perplexity now directly competes with Manus AI and Genspark

  3. The latest DeepSeek R1 matches OpenAI o3 and Google Gemini 2.5 Pro

  4. Anthropic’s opensource tool shows what happens inside Llama and Gemma models

  5. MCP server to vibe test websites with multiple web AI agents in parallel

& so much more!

Read time: 3 mins

AI Tutorial

Picture this: you're deep in a coding session when you need to update your project documentation in Notion. Instead of context-switching to a browser, navigating through pages, and manually editing content, you simply type "Add deployment notes to the API docs" in your terminal. The magic happens instantly—your Notion page updates without you ever leaving your development environment.

In this tutorial, we'll build a Terminal-based Notion Agent using MCP and Agno framework. This agent will allow you to interact with your Notion pages through natural language commands directly from your terminal, enabling operations like content updates, searches, block creation, and comment addition.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Just two weeks ago, OpenAI dropped its software engineering agent, Codex that can write code, open pull requests, and learn from your repo context. It’s already changing how developers use AI agents for development. But this new set of agents goes a step further.

Factory AI has launched Droids, a team of software development agents that don’t just code but handle the entire engineering cycle. Each Droid is built for a specific task like debugging, spec writing, incident response, deep codebase search, or managing tickets. They connect to your tools (GitHub, Linear, Slack, Notion, PagerDuty), pull real-time context, and operate both locally and in the cloud. You can run them locally for pair programming or spin up as many cloud Droids as you have browser tabs.

Key Highlights:

  1. Six Specialized Agents - Factory has built domain-specific agents: Code Droid for development, Reliability Droid for incident management and RCA, Knowledge Droid for documentation and codebase research, Product Droid for PRDs and user stories, plus dedicated agents for Linear management and PR reviews. Each comes pre-configured with optimized prompts, tools, and models for their specialty.

  2. Stack Integration - Native connections to GitHub, Slack, Linear, Notion, Sentry, and other tools with real-time indexing give Droids the same context your team has.

  3. Remote + Local - Droids are fully parallelizable - run as many as you have browser tabs. They support both local synchronous work for pair programming and remote asynchronous execution for background tasks.

  4. Persistent Memory Across Sessions - Both org-level and user-level memory systems capture your team's decisions, documentation patterns, and engineering practices so every Droid remembers context across sessions.

  5. Custom MCP Integrations - Full MCP support lets you connect proprietary tools and internal systems without building custom connectors from scratch. You can extend Droids' capabilities with your own monitoring systems or specialized databases.

Perplexity has launched Labs, its new multi-agent workspace for getting actual work done, not just searching the web and giving you answers. It handles complex, multi-step instructions, uses tools like a code interpreter and headless browser, and generates finished outputs like documents, charts, and visuals. It’s clearly moving into the same zone as Manus AI and Genspark: agents that don’t just assist, but take over execution.

What makes Perplexity Labs interesting is how it quietly checks all the boxes of a general-purpose AI agent: autonomous task planning, tool usage, reasoning across steps, and output that’s ready to use. You can ask it to research, analyze, and even spin up a small app, all inside the same interface. It handles tasks across domains like finance, education, and creative work, without needing constant hand-holding. Available today for Perplexity Pro users.

Key Highlights:

  1. Autonomous Execution with Tooling - Labs autonomously deconstructs complex user prompts, plans multi-step execution paths, and leverages an internal suite of tools, including code execution, headless web browsing for deep data extraction, and design capabilities, to deliver complete project outputs.

  2. Build and Deploy Apps - Labs not only generates code for interactive elements but can also develop and deploy simple apps (like dashboards or slideshows) directly within an "App" tab in the project results, offering immediate utility without needing external development tools.

  3. Extended Self-Supervised Work - Designed for depth, Labs can run self-supervised workflows for 10 minutes or more to complete substantial projects. All generated files, including code, documents, spreadsheets, charts, and images, are systematically organized in an "Assets" tab for straightforward access and download.

  4. Ready-to-Use Assets - Labs produces a wide range of tangible assets including executable code files (Python, HTML/CSS/JS), data visualizations (charts, heatmaps), formatted documents (reports, screenplays), and image-based creative content (storyboards).

Quick Bites

Anthropic has opensourced its circuit tracing tool that lets you generate and explore attribution graphs that show how language models reach specific outputs. The library works with popular open-weight models like Gemma 3 and Llama 3 models, and comes with an interactive frontend on Neuronpedia to visualize and test model internals. You can trace circuits, edit feature values, and observe output changes.

The DeepSeek R1 update pushed to Hugging Face yesterday is officially out of the wraps. DeepSeek-R1-0528 comes with significantly improved depth of reasoning and inference capabilities, reduced hallucinations, and function calling and JSON output. It delivers excellent performance across maths, programming, and general logic, matching the leading models like Gemini 2.5 Pro and OpenAI o3 overall (this is huge!).

DeepSeek has also shipped a smaller 8B distilled version, DeepSeek-R1-0528-Qwen3-8B, matching the performance of Qwen3-235B-thinking and runs on a single GPU.

Hume has introduced EVI 3, its third-generation speech-language model for more expressive and emotionally aware voice AI. EVI 3 can handle transcription, language, and speech in a single model that can speak in any custom voice and style, with near-instant responses and high quality. In blind tests, it outperformed GPT-4o across empathy, emotion handling, and voice realism. API access coming soon.

Tools of the Trade

  1. Vibetest-use MCP: MCP server that launches 10+ Browser-Use agents to test a vibe-coded website and flag every UI bugs, broken links, accessibility issues, and other technical problems.

  2. MCPglue: Lets agents create and run their own tools by wrapping multiple API endpoints into single, reusable functions that stay stable even when the underlying APIs change. It handles auth, pagination, and retries, and exposes all tools through an MCP server so agents can build, execute, and integrate cross-API workflows reliably.

  3. MCP Memory: Opensource MCP server that adds persistent, cross-session memory to MCP clients like Claude desktop, Cursor, and Winsurf, enabling them to recall past conversations, recognize coding patterns, and proactively suggest context. Uses vector embedding for semantic similarity search.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. With agents likely taking up the brunt of the build work, I predict we’re going to see a strong shift back to specifying software behavior (think UML, for example). Your moat in software won’t be how well you build, but how well you can enunciate what you want built. ~
    Emile Silvis

  2. Surprise Claude 4 doesn't have a memory yet. Would be a major self-own to cede that to the other model companies. There is something extremely powerful about an agent that knows you and your motivations, and what you are working towards always.

    o3+memory was a huge unlock! ~
    Garry Tan

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.