• unwind ai
  • Posts
  • Turn Any LLM Into an AI Voice Agent

Turn Any LLM Into an AI Voice Agent

PLUS: OpenAI Codex, Claude Code and Gemini CLI in any app, Opensource AI browser extension

Today’s top AI Highlights:

  1. Turn any LLM into a conversational voice AI agent

  2. Opensource extension lets any AI model control your browser

  3. Run OpenAI Codex, Claude Code, Gemini CLI, or Opencode in your app

  4. Genspark AI wants to become a “vibe-working” platform

  5. The most curious general-purpose coding and research agent with its own computer

& so much more!

Read time: 3 mins

AI Tutorial

Building tools that truly understand your documents is hard. Most RAG implementations just retrieve similar text chunks without actually reasoning about them, leading to shallow responses. The real solution lies in creating a system that can process documents, search the web when needed, and deliver thoughtful analysis. Moreover, running the pipeline locally would reduce latency and ensure privacy and control over sensitive data.

In this tutorial, we'll build a powerful Local RAG Reasoning Agent that runs entirely on your own machine. You'll be able to choose between multiple state-of-the-art opensource models like Qwen 3, Gemma 3, and DeepSeek R1 to power your system.

This hybrid setup combines document processing, vector search, and web search capabilities to deliver thoughtful, context-aware responses without cloud dependencies.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Turn any text LLM into a conversational voice AI agent with just a few commands.

Kyutai has opensourced Unmute, their modular voice AI system that wraps around any text language model to add streaming speech-to-text and text-to-speech capabilities.

Unmute preserves all your LLM's existing capabilities - reasoning, tool calling, function execution, while adding natural voice conversation. The system works with any LLM you choose, from Mistral Small to Llama models, and can run on a single GPU with sub-second response latency.

Key Highlights:

  1. Complete modularity - Since the LLM generating text is independent of the speech components, you can leverage all capabilities of your favorite language models like reasoning and external tool connections, with improvements to text LLMs automatically benefiting voice interactions.

  2. Intelligent conversation flow - Features semantic Voice Activity Detection that figures out when you've stopped speaking without interrupting mid-sentence, plus streaming architecture where text-to-speech starts generating audio before the full LLM response completes.

  3. Easy customization - Character voices and prompts are defined in a simple YAML configuration file, with support for dynamic system prompts and function calling examples like hanging up calls or pulling live news via APIs.

  4. Deployment - Multiple deployment options from single-GPU Docker Compose setups to multi-node Docker Swarm configurations, with detailed documentation for scaling from development to production environments.

AI browsers like Dia promise the future, but why commit to a single vendor's vision when you can have AI control over any browser?

OpenDia is the opensource browser extension that brings agentic AI to any Chromium browser, without forcing you to abandon your current digital setup. Built specifically for power users, this browser extension allows AI models to autonomously navigate the web and perform tasks, from posting tweets to testing signup flows, all while bypassing detection systems that typically block automation tools.

The beauty lies in its universality: instead of being locked into one company's AI browser approach, you get the flexibility to use any AI model with any browser.

Key Highlights:

  1. Browser-Agnostic - Transform any Chromium browser into an AI-controlled powerhouse without switching away from your preferred browser or losing your carefully curated extensions and settings. Works identically across Chrome, Arc, Edge, Brave, Opera, and Vivaldi.

  2. Model Agnostic - Supports Claude, ChatGPT, and even local models through a simple configuration setup. The tool integrates with Claude Desktop via MCP servers and can be configured for other AI platforms.

  3. Anti-Detection Automation - Includes specialized bypasses for Twitter/X, LinkedIn, and Facebook that mimic human behavior to avoid triggering security measures. Perfect for content creators and marketers who need reliable automation for posting, commenting, and engagement.

  4. 17 Tools - From intelligent page analysis and form filling to multi-tab management and real-time content extraction, OpenDia provides comprehensive browser automation while maintaining privacy.

  5. Privacy-First Local Processing - All AI interactions with your browser happen locally on your machine with no cloud data collection or external tracking. You maintain complete control over your browsing data.

Turn any app into the next Cursor or Windsurf without the years of development overhead.

VibeKit is an SDK that lets you embed Claude Code, OpenAI Codex, Gemini CLI, and SST Opencode directly into your application with secure sandboxing, streaming output, and GitHub automation built right in.

Instead of building your own coding agent infrastructure from scratch, you get a plug-and-play solution that works with any sandbox runtime and supports everything from simple code generation to full conversational coding workflows. The race to add AI coding capabilities to every app just got a whole lot easier.

Key Highlights:

  1. One SDK, four agents - VibeKit supports Claude Code, OpenAI Codex, Gemini CLI, and SST Opencode through a single interface, letting you switch between agents or run multiple simultaneously without rewriting your integration code.

  2. Secure by design - Every code execution happens in isolated sandboxes with support for E2B, Daytona, Modal, and Fly.io, ensuring your application stays protected while agents generate and run real code in controlled environments.

  3. GitHub-native workflows - Built-in GitHub integration handles branch creation, commits, and pull requests automatically, making it perfect for conversational UIs where users iteratively request changes and see them reflected in real-time.

  4. Observability - OpenTelemetry support provides comprehensive tracing and metrics out of the box, plus streaming responses keep your UI responsive while agents work on complex coding tasks.

Quick Bites

AI super agent platform Genspark has released AI Docs, an agentic AI document creator that can create entire documents that look professionally designed with one simple prompt. It natively supports both rich text and Markdown. You can start afresh or choose from 100s of templates for resumes, reports, forms, legal docs, event planning, and many more.

With AI Slides, AI Sheets, and now AI Docs, Genspark can automate your entire workspace suite, where you can just “give in to the vibes” and fully delegate your work.

Replit Agent can now think out loud, search the web, and tap into more powerful models on demand. The new "Dynamic Intelligence" update introduces extended thinking modes that show the Agent's step-by-step reasoning, web search capabilities for real-time information gathering, and access to higher-capability models for complex tasks. You can toggle these features per request, to dial up the Agent's intelligence exactly when needed.

Well, thanks OpenAI for pioneering the $200/month AI subscription tier—now everyone wants a piece of that premium pie. Perplexity has released Perplexity (Pro) Max, a new $200 per month subscription tier where they are offering unlimited Labs queries, early access to o3-pro and Opus-4, plus first dibs on their upcoming Comet browser.

Cursor just dropped their version 1.2, which feels less like using a tool and more like working alongside someone who gets your workflow. Instead of diving straight into code, agents now create structured to-do lists that map out task dependencies, and enhanced context systems mine insights from PRs, commits, and your entire codebase history.

  • Agent Planning with To-Do Lists - Agents now create structured task breakdowns with dependencies, visible in chat and streamed to Slack

  • Message Queueing - Queue follow-up instructions while the agent works, then reorder and execute without interruption

  • PR Indexing & Search - Semantic search through pull requests, issues, commits, and branches with associated GitHub comments

  • Enhanced Codebase Search - New embedding model delivers more accurate semantic search with cleaner, focused results

  • 100ms Faster Tab Completions - Restructured memory management reduces time-to-first-token by 30%

Tools of the Trade

  1. Scout: Autonomous AI agent with its own virtual computer. It helps you with deep research, coding, data analysis, content creation, and more. It browses the web, runs terminal commands, edits code, and creates files on its virtual computer. Set it running on tasks that take minutes or hours, then come back later while Scout keeps going. (Do check out the demo video, we loved it!)

  2. Dash: A general agent that connects to your G-Suite, Slack, Notion, Linear, and more to take in your personal context and perform actions. Consider it an agentic Glean that allows knowledge workers to save 2-3 hours a day of busy work.

  3. Open Researcher: An opensource agentic researcher powered by Firecrawl's web scraping and Claude’s reasoning and intelligence. It searches and analyzes web content for real-time data, shows its thinking process, and responds with citations.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. it's key for the US to win on open weights/open source models. It's very clear there is a distinct need for this. ~
    Sriram Krishnan


  2. Yann Lecun working under Alexandr Wang - how long will that last? ~
    Jen Zhu

  3. OpenAI is super nice for giving everyone the week off for interviews at meta ~
    Daniel

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.