• unwind ai
  • Posts
  • Computer Use AI Agents for Apps

Computer Use AI Agents for Apps

PLUS: Define schemas for AI agent memory, Voice AI agents with built-in RAG

Today’s top AI Highlights:

  1. Create computer-use AI agents for specific apps with parallel workflows

  2. Zep just made AI agent memory way more organized and structured

  3. Black Forest Labs is back with new image models that can generate and edit images

  4. Build multimodal, multilingual, multi-character voice AI agents with built-in RAG

  5. Video-based AI memory library for 10x compression and faster RAG

& so much more!

Read time: 3 mins

AI Tutorial

Picture this: you're deep in a coding session when you need to update your project documentation in Notion. Instead of context-switching to a browser, navigating through pages, and manually editing content, you simply type "Add deployment notes to the API docs" in your terminal. The magic happens instantly—your Notion page updates without you ever leaving your development environment.

In this tutorial, we'll build a Terminal-based Notion Agent using MCP and Agno framework. This agent will allow you to interact with your Notion pages through natural language commands directly from your terminal, enabling operations like content updates, searches, block creation, and comment addition.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Running AI agents on full desktops often leads to chaos - they get distracted by random windows, click on the wrong elements, and generally lose focus when there's too much visual noise. Opensource compute-use AI agent framework CUA has released App-Use that lets you create virtual desktops that only show specific apps, so your agents can actually focus on what matters.

Instead of giving an agent access to your entire screen and hoping for the best, you can now say "only work with Safari and Notes" or "just control iPhone Mirroring." This visual isolation dramatically improves task completion accuracy while staying lightweight since it's just filtering what agents see. You can run multiple specialized agents in parallel without them interfering with each other, perfect for complex workflows like research agents browsing while writing agents draft content.

Key Highlights:

  1. Lightweight Visual Filtering - App-Use creates composited views using macOS's Quartz engine rather than spawning new processes, so your apps keep running normally while agents only see what you specify. The virtual desktops automatically scale resolution to fit app windows and menu bars, requiring no additional compute resources beyond standard window management ops.

  2. Parallel Agent Workflows - Deploy multiple specialized agents simultaneously, each working in its own app-scoped environment without interference. You can have a research agent focused on Safari while a writing agent works in Pages and Notes, all running at the same time with simple async/await patterns in your code.

  3. iPhone Automation - Connect to your Mac's iPhone Mirroring app and control your iPhone through natural language commands. Send messages, create reminders, navigate any iOS app, or extract data from apps that don't have APIs - all through the same agent interface you're already using.

  4. Drop-in Integration - App-Use works seamlessly with your existing CUA setup and agent code - just enable the experimental feature flag and start creating app-scoped desktops. Compatible with all LLM providers, including OpenAI, Anthropic, and local models.

AI agent memory framework Zep AI has released Entity Types, a new feature that lets you define structured, domain-specific entities within their AI agent's memory system. Instead of working with generic graph nodes that store unstructured information, you can now create precise schemas for data like user preferences, business procedures, or custom domain objects.

This means your agents can capture, classify, and retrieve specific information with much higher accuracy, whether you're building a travel agent that remembers seating preferences or a healthcare agent tracking patient conditions. The feature includes both ready-to-use default types and the flexibility to define completely custom entity schemas using familiar Pydantic models.

Key Highlights:

  1. Pre-built Default Entity Types - Zep automatically classifies common data into User, Preference, and Procedure entities without any configuration. When a user mentions "I prefer window seats over aisle," Zep automatically creates a Preference node with structured attributes, making it easy to filter and retrieve specific types of information using simple API calls.

  2. Custom Entity Types - Define your own entity schemas using familiar Pydantic models to capture domain-specific data structures. Whether you're building a travel agent that needs AirTravelPreferences with cabin_class and max_layovers fields, or a healthcare app tracking medical conditions, you can create up to 10 custom entity models with 10 fields each.

  3. Filtered Search and Data Access - Search your knowledge graph by specific entity types to get precisely what you need, rather than sifting through generic nodes. You can query for all Preference entities related to "seating" or retrieve structured ApartmentComplex objects with actual price_of_rent values, dramatically improving search precision and data usability.

  4. Cross-Language SDK Support - Available across Python, TypeScript, and Go SDKs with consistent APIs that feel natural in each language. The Python implementation uses Pydantic BaseModel subclasses, while TypeScript and Go follow their respective conventions, so you can integrate structured memory without learning new paradigms.

Quick Bites

AI voice platform Resemble AI has opensourced Chatterbox, their text-to-speech production-grade model, that outperforms the SOTA ElevenLabs in blind evaluations. The MIT-licensed model offers zero-shot voice cloning from just 5 seconds of audio, unique emotion intensity controls, real-time synthesis with built-in watermarking, and sub-200ms latency for real-time applications. You can test the model on Hugging Face.

Black Forest Labs, the AI startup that created quite a stir last year with its image models FLUX and also powered Grok’s image generation for a few months, has released FLUX.1 Kontext, a new suite of models that can generate and edit images using text and visual prompts together. The models offer character consistency, local editing capabilities, and style referencing, while being up to 8x faster than current models. Two versions are now available on various platforms like KreaAI and LeonardoAI: FLUX.1 Kontext [pro] for fast iterative editing and FLUX.1 Kontext [max] for enhanced prompt adherence.

BFL has also developed an open-weight 12B diffusion variant, FLUX.1 Kontext [dev] in a private beta release.

ElevenLabs has released Conversational AI 2.0, an upgrade to its platform for building and deploying voice AI agents. This release brings in incredible features like auto language detection and switching, multi-character voices, multimodal inputs, and built-in RAG for real-time retrieval. Their speech model also brings state-of-the-art turn-taking for the agent to understand when to interrupt or when to wait.

Check out their documentation, and you can start building with Conversational AI 2.0 today.

Tools of the Trade

  1. Memvid: Encodes text chunks into MP4 video files to create a compressed, searchable knowledge base that can store millions of text chunks without requiring traditional vector databases that consume massive amounts of RAM and storage. It provides semantic search capabilities with sub-second retrieval times. Works with OpenAI, Anthropic, or local models.

  2. Magic AI Agent: creates stunning, production-ready React components using the 21st.dev library. It performs RAG to find the top 3 matching components and draws inspiration from them to create new, unique components. The IDE agent understands your application context and seamlessly integrates the new components in the right place.

  3. LLM: A CLI tool and Python library for interacting with OpenAI, Anthropic’s Claude, Google’s Gemini, Meta’s Llama and dozens of other LLMs, both via remote APIs and with models that can be installed and run on your own machine.

  4. MCP Defender: Opensource and free desktop application to protect your computer from malicious MCP traffic. All MCP tool call requests and responses from MCP clients like Cursor, Claude, and Windsurf are automatically proxied, and if anything harmful data is detected, it will alert you and ask if you want to allow or block the tool call from proceeding.

  5. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. The truth is GPT 5 is virtually done. Open ai is either playing a heavy hand or there scrambling for more time.

    I have faith they will blow us away. There is no doubt they’ll have to do an event for the release of gpt 5. Not just a table a YouTube livestream. ~
    Chris

  2. A reason companies are excited about agents is that they think agents will let them skip the hard task of figuring how to integrate AI into the daily process of work- in theory, agents just let you treat the AI like an employee.

    More value will come from tackling the hard task. ~
    Ethan Mollick

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.