• unwind ai
  • Posts
  • OpenAI Operator, Manus AI & Deep Research in Opensource II-Agent

OpenAI Operator, Manus AI & Deep Research in Opensource II-Agent

PLUS: Cloud compute purpose-built for AI, OpenAI API supports remote MCP servers

Today’s top AI Highlights:

  1. Globally distributed GPU cloud purpose-built for AI workload

  2. This agent combines Manus AI, Genspark, and Deep Research in one opensource framework

  3. Google’s new Gemma 3 multimodal model can run on 2GB of RAM

  4. Mistral AI’s opensource Devstral model is built specifically for SWE tasks

  5. Opensource framework to build and deploy MCP servers

& so much more!

Read time: 3 mins

AI Tutorial

Building tools that truly understand your documents is hard. Most RAG implementations just retrieve similar text chunks without actually reasoning about them, leading to shallow responses. The real solution lies in creating a system that can process documents, search the web when needed, and deliver thoughtful analysis. Moreover, running the pipeline locally would reduce latency and ensure privacy and control over sensitive data.

In this tutorial, we'll build a powerful Local RAG Reasoning Agent that runs entirely on your own machine, with web search fallback when document knowledge is insufficient. You'll be able to choose between multiple state-of-the-art opensource models like Qwen 3, Gemma 3, and DeepSeek R1 to power your system.

This hybrid setup combines document processing, vector search, and web search capabilities to deliver thoughtful, context-aware responses without cloud dependencies.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Most cloud platforms were built for generic workloads - storage, compute, basic web services. But AI is a different beast. When you’re training models, spinning up multi-agent systems, or running real-time inference, traditional cloud starts to show its cracks: cold starts, overhead, unpredictable latency.

That’s why more developers are turning to RunPod. It’s a GPU cloud built specifically for AI workloads. You get on-demand access to high-performance GPUs - H100s, A100s, MI300Xs - in 30+ global regions, with a pricing model designed for flexibility.

Choose between full GPU pods (billed by the minute) or serverless endpoints that scale to zero and spin up in milliseconds. The result: faster iteration, lower costs, and way less infrastructure overhead.

Here’s why it stands out:

  1. Milliseconds to launch - RunPod pods cold-start in under a second, making them ideal for fine-tuning, dev work, or spinning up multi-agent test environments on-the-fly.

  2. Inference without the infra - Serverless GPU endpoints keep your containerized models dormant until needed, then scale instantly. Great for LLM chatbots, API backends, or agent orchestration.

  3. Your stack, your way - Run any container: public, private, custom-built with 50+ prebuilt templates (vLLM, PyTorch, etc.) and full support for advanced setups like RAG pipelines, agent chains, or memory-backed sessions.

  4. No surprises - GPU pricing starts at $0.16/hr with no hidden fees. Detailed logs and usage dashboards make it easy to monitor costs and optimize performance.

If you’re building something ambitious with AI and tired of wrestling with general-purpose cloud infrastructure, RunPod might be exactly what you need.
Try it today

Emad Mostaque's latest venture, Intelligent Internet, just dropped II-Agent, an open-source agent framework that merges the best ideas from Manus AI, GenSpark, Deep Research, and OpenAI Operator into a single agent system. It’s uses Claude 3.7 Sonnet as its core reasoning engine, and can handle everything from writing code and automating workflows to web search and navigation, research, data visualization, and content generation.

What makes II-Agent stand out is how it combines structured reasoning, tool use, and smart context handling in a tightly looped system. Unlike proprietary alternatives like Manus or GenSpark, II-Agent gives you complete visibility into its architecture and the freedom to customize, extend, or integrate it into your own systems. It achieves top-tier performance on the GAIA benchmark, outperforming its open and even closed-source peers.

Key Highlights:

  1. Planning and Execution - Inspired by Anthropic’s “think” module, II-Agent breaks down tasks into smaller parts, thinks through alternative paths, and adapts mid-way based on new results. It logs each reasoning step so you can trace back what it did and why. This is great for debugging, auditing, or improving task chains.

  2. Built-in Tools - It’s not just an LLM wrapper. II-Agent can read/write files, execute shell commands, generate scripts, search the web, and automate browser actions like clicking buttons, typing in fields, or scrolling. It analyzes screenshots with Claude or GPT-4o vision to understand page layouts, so you can use it for visual web tasks too.

  3. Context management - The agent keeps track of everything in the conversation but trims older parts if token limits get tight. For large outputs like scraped web pages, it saves them to disk and adds references in the chat history. This keeps the context clean and avoids wasting LLM capacity.

  4. Real-time Interaction - Every thought, tool invocation, and result is streamed back to the client. This gives you full visibility into what’s happening under the hood. The included frontend (built with React) shows task progress and makes debugging workflows or inspecting agent behavior much easier.

Quick Bites

Another banger from ByteDance: they have opensourced BAGEL, a unified multimodal model that brings together understanding and generating multiple modalities, as well as Thinking mode in one system. It matches the capabilities of proprietary models like GPT-4o and Gemini 2.0, handling generation, editing, style transfer, navigation, and reasoning in a single architecture.

The model is trained on trillions of web, language, image, and video data, during which it showed strong emergent behavior in complex reasoning and editing tasks. ByteDance has opensourced the model, training code, weights, data protocols, and benchmarks on Hugging Face and GitHub.

Google has announced Gemma 3n, a small multimodal model designed for on-device AI that runs on as little as 2GB of RAM. It shares the same architecture as Gemini Nano, supports text, image, and audio inputs, and is engineered for fast and lean performance. Despite its small footprint, it scored an impressive 1283 Elo on Chatbot Arena, just behind Claude 3.7 Sonnet. You can try it now via Google AI Studio or run it locally with Google AI Edge tools.

Mistral and AllHands AI have released Devstral, a new opensource model built specifically for software engineering tasks, not just code generation. It’s trained to solve real GitHub issues having context of large codebases, using agentic coding scaffolds like OpenHands and SWE-Agent. Devstral scores 46.8% on SWE-Bench Verified, outperforming both open and closed models, including GPT-4.1-mini, Claude 3.5 Haiku, and Deepseek-V3. It's available under Apache 2.0 and can run locally on a single RTX 4090 or a Mac with 32GB RAM.

Vercel has launched the AI Gateway in alpha, built on AI SDK 5, letting developers switch between ~100 AI models without managing separate API keys or provider accounts. It handles authentication, usage tracking, and will soon support billing, failover, and load balancing across providers. It’s free to use during alpha based on your Vercel plan.

OpenAI’s Responses API now supports remote MCP servers, giving developers an easy way to plug models to external tools and services like Stripe, Twilio, and Shopify. Alongside MCP integration, OpenAI has also released several built-in tools in the API, including image generation, Code Interpreter, and upgraded file search. These tools work across the GPT-4o, GPT-4.1, and o-series models for building smarter agent workflows with better context handling and lower latency.

Tools of the Trade

  1. Golf: Easiest framework to build and ship a production-ready MCP server. You just write the tools, prompts, and resources you want agents to call. Golf handles everything else - routing, auth, telemetry, error reporting, and deployment. Then you can observe and manage it through their gateway.

  2. Rocketship: Opensource testing engine for running complex, API-driven test scenarios using declarative YAML specs. It uses Temporal for durable execution and supports a plugin system to extend testing across custom APIs and protocols.

  3. Super: A unified AI search and assistant platform that connects all your tools, like Notion, Google Drive, Jira, GitHub, and indexes the data for fast, accurate answers. Unlike MCP-based agents that call APIs sequentially, Super runs parallel retrieval and reasoning to deliver results instantly and power entire workflows.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. You think top AI agents are peak genius tech, next-level AGI galaxy brains, then you look into Claude Code’s prompt and it straight-up telling Claude it’s gonna lose 1000$ if it messes up ~
    Yam Peleg

  2. I still remember when people thought "prompt engineering" was going to become a real career. ~
    Santiago

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.