- unwind ai
- Posts
- OpenAI's Remote Software AI Agent
OpenAI's Remote Software AI Agent
PLUS: Record browser workflow once and automate forever, Turn any AI agent to A2A servers
Today’s top AI Highlights:
Record your browser actions once and automate them forever
OpenAI’s remote software agent that can run many tasks in parallel
Huggging Face released a free comprehensive course on MCP
Turn AI agents into A2A servers with minimal changes to your code
& so much more!
Read time: 3 mins
AI Tutorial
Building tools that truly understand your documents is hard. Most RAG implementations just retrieve similar text chunks without actually reasoning about them, leading to shallow responses. The real solution lies in creating a system that can process documents, search the web when needed, and deliver thoughtful analysis. Moreover, running the pipeline locally would reduce latency and ensure privacy and control over sensitive data.
In this tutorial, we'll build a powerful Local RAG Reasoning Agent that runs entirely on your own machine, with web search fallback when document knowledge is insufficient. You'll be able to choose between multiple state-of-the-art opensource models like Qwen 3, Gemma 3, and DeepSeek R1 to power your system.
This hybrid setup combines document processing, vector search, and web search capabilities to deliver thoughtful, context-aware responses without cloud dependencies.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Record Once, Reuse Forever 📸 🌐 ♾️
A few months back, Browser Use was released as a great tool to control the browser using natural language, great for tasks like booking flights or finding products. However, enterprises needed something more reliable for high-volume repetitive workflows, as pure LLM agents proved too slow, expensive, and unpredictable for these scenarios.
Enter Workflow Use, a tool that transforms browser automation by letting you record steps manually instead of describing them with prompts. It converts these recordings into deterministic scripts with variables, runs 10x faster than Browser Use, costs about 90% less, and includes self-healing functionality that falls back to Browser Use if a step breaks, though this feature is still in early development.
Key Highlights:
Record-and-Replay - Skip the complex prompting and simply show the system what to do by recording your workflow once. The system automatically converts your recording into a deterministic script with variables, filtering out noise to create meaningful, reusable workflows.
Speed and Cost Benefits - Workflows run approximately 10x faster and 90% cheaper than pure LLM-based approaches, making them practical for high-volume enterprise applications where performance and cost-efficiency matter.
Self-Healing Capabilities - When a deterministic step fails (like when a website changes), Workflow Use automatically falls back to the Browser Use agent to complete the step, combining the reliability of scripts with the adaptability of AI.
Variable Handling - The system automatically extracts variables from forms in your workflow, allowing you to run the same process repeatedly with different inputs without recording new workflows each time.
Grego Brockman kicked off the release by saying it straight: “Software engineering is changing, and by the end of 2025, it's going to look fundamentally different.” SWE isn’t just about writing code anymore. It’s shifting towards managing AI agents that can work for you. With parallel agents handling different tasks at once, we are now getting more done without context-switching.
OpenAI has launched Codex, a cloud-based software agent that runs multiple engineering tasks in parallel inside isolated environments. Powered by codex-1, a version of OpenAI o3 specifically optimized for software engineering, Codex runs in isolated cloud environments preloaded with your repository, allowing it to write features, fix bugs, answer codebase questions, and propose pull requests while you focus on other tasks. It is available now for ChatGPT Pro, Enterprise, and Team users, with Plus and Edu access coming soon.
Key Highlights:
Parallel Task Processing - Codex works asynchronously in separate isolated environments, handling multiple coding tasks simultaneously. Each task runs independently in its own sandbox, with full access to read and edit files, run commands, and execute tests, completing tasks typically within 1-30 minutes depending on complexity.
Transparent Execution - Every action Codex takes is documented with verifiable evidence through citations, terminal logs, and test outputs. You can monitor progress in real-time and trace each step taken during task completion, maintaining full visibility into how changes were implemented.
Repository Guidance - Customize Codex's behavior using AGENTS.md files placed within your repository, similar to README files, to provide instructions on codebase navigation, testing commands, and project-specific standards. This helps Codex align with your team's practices and development workflow.
Secure Execution Model - Codex operates in a secure, isolated container with internet access disabled during task execution. It interacts solely with the code explicitly provided via GitHub repositories and pre-installed dependencies configured by you, ensuring your code remains protected.
Codex CLI Updates - OpenAI is releasing a smaller version of codex-1 called codex-mini-latest, optimized for faster low-latency code Q&A and editing in the terminal. They've also simplified authentication—you can now sign in with your ChatGPT account instead of manually generating API tokens, and Plus/Pro users can redeem $5/$50 in free API credits.
Quick Bites
Hugging Face just released a course on Model Context Protocol. It covers everything you need to know from the fundamentals of MCP to building MCP servers and using them in applications. You can also participate in challenges where you will evaluate your MCP implementations against other students’. It’s completely free!
Decentralized training networks are becoming a serious effort to open up AI development beyond big tech, with companies like Prime Intellect and Nous leading the way. Nous Research has launched Psyche, a decentralized training network that lets anyone contribute compute power to train large models. They’re starting things off with the testnet pretraining of a 40B parameter LLM called Consilience, built using the MLA architecture over 20T tokens.
This will be the largest internet-based pretraining run to date and could open up new paths for small labs, researchers, and individuals to train serious models without relying on centralized compute hubs.
Cognition AI has launched Devin 2.1 with automatic confidence ratings for tasks, shown using 🟢 🟡 🔴. Devin now asks clarifying questions when it's unsure and waits for user approval before moving ahead. It also comes with deeper codebase understanding, making it better at handling large and complex codebases. The same confidence scoring is now built into its Linear and Jira integrations, letting you prioritize high-confidence issues without starting a full session.
Tools of the Trade
AutoA2A: Turn AI agent built using CrewAI, LangGraph, LlamaIndex, OpenAI Agents SDK, or Pydantic AI into an A2A server without modifying any project code. Just add the AutoA2A library to your project, run the CLI, and it scaffolds the project as an A2A server.
Jazzberry: AI bug finder that automatically tests your code when a pull request occurs to find and flag real bugs before they are merged. It clones your repo in a secure sandbox and runs the diff, executes relevant commands, and reports concrete issues with evidence directly in the PR.
BrowserBee: Opensource Chrome extension that allows you to run and automate tasks using your LLM of choice. It runs entirely within your browser (with the exception of the LLM), and it can safely interact with logged-in websites, like your social media accounts or email, without compromising security or requiring backend infrastructure. Uses Playwright in the background.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
Once most projects will be vibe-coded, we should make artisanal agencies, where staff engineers will make hand-written code for the most exquisite customers. We’ll call it organic coding. ~
YaroslavStill find it hilarious that agents just ended up being a while loop ~
Sully Omarr
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply