unwind ai
Posts
Opensource Multi-Agent Deep Research from China

Opensource Multi-Agent Deep Research from China

PLUS: Real-time self-learning for AI agents, Vibe code front-end with Qwen 3 for free

Shubham Saboo & Gargi Gupta
May 12, 2025

Today’s top AI Highlights:

Real-time self-learning for AI agents is just 4 lines of code
ByteDance releases opensource multi-agent system for Deep Research
Excel and Sheets go fully agentic with Genspark AI Sheets
Run your own MCP servers for 1000s of APIs
Vibe code apps with Qwen models with the new Web Dev feature

& so much more!

Read time: 3 mins

AI Tutorial

While working with web data, we keep facing the challenge of extracting structured information from dynamic, modern websites. Traditional scraping methods often break when coming across JavaScript-heavy interfaces, login requirements, and interactive elements - leading to brittle solutions that require constant maintenance.

In this tutorial, we're building an AI Startup Insight Agent application that uses Firecrawl's FIRE-1 agent for robust web extraction. FIRE-1 is an AI agent that can autonomously perform browser actions - clicking buttons, filling forms, navigating pagination, and interacting with dynamic content - while understanding the semantic context of what it's extracting.

We'll combine this with OpenAI's GPT-4o to create a complete pipeline from data extraction to analysis in a clean Streamlit interface. We’ll use Agno framework to build our AI startup insight agent.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI Startup Insight Agent with FIRE-1

Fully functional agent app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Real-Time Self-Learning for AI Agents 🧠⏳📝

Theta is building an intelligent memory layer that helps AI agents learn from previous runs. With just four lines of code, it plugs into your existing stack and starts analyzing every run for mistakes, key decisions, and areas to optimize. The insights are stored and used in future runs to improve accuracy, reduce steps, and cut down human intervention. It’s already shown strong results by improving OpenAI Operator’s accuracy by 43% and reducing steps by 7x.

Key Highlights:

Real-time learning - Every agent run is evaluated for errors, skipped steps, and inefficient patterns. These insights are embedded and passed forward to make future runs smarter. The more your agent runs, the better it gets without needing human tuning.
Seamless integration - The SDK works with your existing stack and data as-is. Just four lines of code and the memory layer kicks in - no model fine-tuning, no data labeling, no prompt engineering.
Learning across sessions and workflows - Unlike typical agents that start from scratch each time, Theta lets agents build context over time. This makes them more adaptive, avoids repetitive mistakes, and handles real-world workflows with less intervention.
Evaluation feeding back into learning - Theta treats evaluations as training input. Their specialized eval environments offer telemetry that goes beyond logging, giving agents actionable feedback from every run to refine behavior on the fly.
Proven performance gains - Testing with OpenAI Operator shows 43% accuracy improvements and 7x fewer steps to complete the same tasks, directly translating to better speed and lower operational costs.

Opensource Deep Research Framework with Tools and MCP 🧑‍🔬🧰🌐

ByteDance has released DeerFlow, an opensource multi-agent framework built for automating deep research workflows. It connects AI agents with domain tools like web search, web crawling, Python code execution, and even podcast generation. The system is built on LangGraph, allowing task decomposition and message-passing between agents. You can launch a console-based or web-based interface and get detailed research outputs with full planning, review, and editing loops.

It supports MCP to plug agents into local or remote data sources without writing custom integrations. Reports can be converted into audio using TTS, turned into slides using Marp, and edited using Notion-style blocks. It’s open-source; easy to run locally.

Key Highlights:

Full agent stack using LangGraph - DeerFlow uses a modular multi-agent pattern with four core roles — Coordinator (entry point), Planner (task breakdown), Researcher/Coder (execution), and Reporter (final synthesis). The flow can be debugged live using LangGraph Studio, with full visibility into task progression and data transfer.
Seamless tool integration via MCP - DeerFlow supports Model Context Protocol to hook agents into tools like GitHub, Google Drive, Slack, and other APIs. Developers can reuse existing MCP servers or build their own, making it easier to plug in structured data, knowledge graphs, or even internal company tools.
Report-to-audio and slide generation - Every research report can be converted into a podcast using VolcEngine TTS. You can also export a presentation using Marp CLI with customizable slide templates, making it ideal for research distribution, content teams, or internal briefings.
Dev-friendly setup - The framework can be run locally using uv for Python environments, nvm for Node.js, and pnpm for frontend dependencies. Environment variables and YAML config files handle model APIs, search engine selection (Tavily, Brave, DuckDuckGo, Arxiv), and TTS options.

Quick Bites

Genspark has rolled out AI Sheets, a full agentic spreadsheet tool that handles everything from uploading messy Excel files to generating insights, charts, and executive reports. Just ask questions in plain English—like campaign performance or user trends—and it runs code, applies formulas, and visualizes results instantly. It’s already being used for tasks like market research, recruiting, content analysis, and even creating custom study plans or ad creatives based on your input files.

Apple has released FastVLM, its new vision language model designed for high-speed performance, fast image encoding, and on-device inference. The release includes PyTorch code, MLX implementation, and an iOS demo app that runs fully on-device using Apple Silicon. FastVLM introduces FastViTHD, a vision encoder that outputs fewer tokens and drastically cuts encoding time. The smallest variant beats LLaVA-OneVision-0.5B with 85x faster time-to-first-token. While the code is publicly available, it comes under a custom Apple license that is more restrictive than typical opensource licenses.

llama.cpp now supports vision input, letting you run multimodal models locally using llama-mtmd-cli or llama-server. Models like Gemma 3, Qwen2.5 VL, and SmolVLM are already supported, and you can enable vision with a simple -hf flag or load your own projector file if needed. What’s nice is that vision is now built directly into the server—no extra hacks or plugins. This makes it easier to manage, faster to update, and cleaner to use across different tools.

Hugging Face has released nanoVLM, a compact PyTorch-based framework that lets you train a vision-language model from scratch in just 750 lines of code. It’s designed to be readable, modular, and easy to extend, making it ideal for learning, prototyping, or research. The model uses a SigLIP-B/16 vision encoder and a SmolLM2 language decoder. All code is open on GitHub and the Hugging Face Hub.

Pipedream has launched mcp.pipedream.com with dedicated MCP servers for 2,500+ integrated apps, including Slack, GitHub, and Google Sheets. You can run the servers locally with npx @pipedream/mcp or host the servers yourself to use them within your app or company. These servers come with built-in managed auth to handle OAuth and credential storage.

Tools of the Trade

Qwen Web Dev: Vibe code applications with Qwen. Qwen chat’s new Web Dev feature lets you build stunning frontend webpages and apps using simple prompts, ready with code. No coding required.
RAGBuilder: Opensource toolkit to build and optimize production-ready RAG pipelines by tuning parameters like chunking strategy, chunk size, retriever type, and LLM settings using your dataset. It includes pre-built templates, supports custom configs, and can be deployed as an API or saved as a reusable pipeline.
Airweave: Lets AI agents semantically search any application or database. It's MCP compatible and seamlessly connects any app, database, or API, to transform their contents into agent-ready knowledge.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

The best way to learn programming is to start with vibe coding
Vibe coding gets you started and it’s easy to learn as you go
There is nothing more satisfying than building and learning at the same time ~
Bindu Reddy
If you're in tech and not in SF or NYC, you lack either obsession or ambition. ~
ₕₐₘₚₜₒₙ — e/acc

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.