• unwind ai
  • Posts
  • OpenAI's First Open Model in 5 Years

OpenAI's First Open Model in 5 Years

PLUS: Big AI releases by Anthropic, Google DeepMind, Qwen, and ElevenLabs

In partnership with

Today’s top AI Highlights:

  1. OpenAI releases two open reasoning models you can run locally

  2. Anthropic releases the most advanced agentic coding model

  3. Open search model achieves 72% on Google's hardest benchmark

  4. Explore, not just watch, AI-generated worlds

  5. Pinpoint and tell your AI agent what to fix in your UI

& so much more!

Read time: 3 mins

AI Tutorial

Finding the perfect property in today's competitive real estate market can be overwhelming. With thousands of listings across multiple platforms, varying market conditions, and complex investment considerations, homebuyers often struggle to make informed decisions efficiently. What if we could create specialized agents that work together like a professional real estate team?

In this tutorial, we've built a multi-agent AI real estate agent team that provides detailed property listings, market insights, and investment analysis all in one interface without you having to search multiple websites.

This system uses three specialized agents working in concert:

  1. Property Search Agent that finds listings across major platforms,

  2. Market Analysis Agent that provides neighborhood insights, and

  3. Property Valuation Agent that delivers investment analysis.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

OpenAI delivered on the hype.

They released two open-weight mixture-of-expert reasoning GPT models, optimized for reasoning, agentic tool-use, and running locally, available under the Apache 2.0 license.

The two models are:

  • gpt-oss-120b - for production, general purpose, high reasoning use cases that fit into a single H100 GPU (117B parameters with 5.1B active parameters)

  • gpt-oss-20b - for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)

The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.

Key Highlights:

  1. Configurable reasoning effort - Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.

  2. Full chain-of-thought - Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.

  3. Agentic capabilities - Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.

  4. Native MXFP4 quantization - The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU, and the gpt-oss-20b model run within 16GB of memory.

  5. Availability - Available to download for free from Hugging Face. You can also use them via Ollama, llama.cpp, LM Studio, AWS, and other inference providers. You can also try these models on gpt-oss.com.

The #1 AI Newsletter for Business Leaders

Join 400,000+ executives and professionals who trust The AI Report for daily, practical AI updates.

Built for business—not engineers—this newsletter delivers expert prompts, real-world use cases, and decision-ready insights.

No hype. No jargon. Just results.

Today, coding just got more accurate.

Anthropic released Claude Opus 4.1, an improvement over Claude Opus 4, pushing coding performance to 74.5% on SWE-bench Verified.

This isn't a massive overhaul. It's a targeted refinement that tackles the real problems developers face daily. The model excels at multi-file code refactoring and brings improved in-depth research and data analysis skills, especially around detail tracking and agentic search.

Available now to paid users through Claude's platform, API, Amazon Bedrock, and Google Cloud's Vertex AI.

Key Highlights:

  1. Coding Performance - Achieves 74.5% on SWE-bench Verified, outperforming other state-of-the-art models, including OpenAI o3, Gemini 2.5 Pro, Kimi k2, and Qwen3 Coder flagship model.

  2. Enhanced Research Capabilities - Improved in-depth research and data analysis skills with better detail tracking and agentic search functionality for complex information synthesis.

  3. Seamless Integration - Drop-in replacement for Opus 4 available across Claude's platform, API, Amazon Bedrock, Google Cloud's Vertex AI, and now GitHub Copilot for Enterprise and Pro+ users. The API price remains the same as Opus 4.

A compact 4B model just outperformed much larger systems at web search and reasoning tasks.

While everyone's racing to build bigger models, Emad Mostaque's Intelligent Internet just proved that smart training beats raw scale. Their II-Search-4B and II-Search-CIR-4B models deliver specialized search capabilities while running locally on your machine, outperforming much bigger models like Qwen3 30B.

Built specifically for multi-hop reasoning and research tasks, these models can handle complex information-seeking tasks that typically require cloud-dependent giants. The real magic happens with their Code-Integrated Reasoning approach, where the model writes and executes Python code to call search APIs programmatically, turning it into a proper research assistant rather than just a text generator.

Key Highlights:

  1. Local powerhouse - Despite being only 4B parameters, both models consistently outperform much larger baselines on information-seeking benchmarks, with II-Search-CIR-4B achieving 72.2% on the challenging Frames dataset.

  2. Training methodology - Uses a four-phase training approach that progressively builds tool usage, multi-step reasoning, and comprehensive report generation through custom datasets and reinforcement learning with real search environments.

  3. Code-powered reasoning - II-Search-CIR-4B embeds executable Python within its reasoning process, allowing it to programmatically call web search and browsing APIs for more flexible and powerful information gathering.

  4. Deployment - Both models are fully opensource and available on Hugging Face, with inference code and datasets included.

Quick Bites

Google has unveiled Genie 3, a world model that lets you not just watch, but explore AI-generated worlds in real-time at 720p and 24fps. Environments created remain largely consistent over several minutes, with visual memory extending as far as a minute in the past. Beyond academic and entertainment, this is great for training embodied agents, where an AI agent can be placed in these worlds to accomplish a task. Early access is limited to select academics and creators.

In an unexpected release, Alibaba Qwen just dropped Qwen-Image, an opensource 20B MMDiT image generation model with native text rendering. The model can generate crisp, readable text directly within images, from multi-line English layouts to Chinese calligraphy with the precision of a professional typesetter, and delivers SOTA performance across generation and editing benchmarks. Try it now at Qwen Chat and see why this changes everything. Available to download on Hugging Face and Modelscope.

ElevenLabs just released Eleven Music, their first foray into AI music generation, and it's hitting studio-grade quality with full structural control over verses, choruses, and bridges. The model handles multilingual vocals across any genre. You can even edit individual sections after generation, essentially giving you a complete music production suite behind a text interface. Try it at 50% off through August and test how far you can push its creative range.

Daily reminder that while you slept, a Chinese AI company released an opensource model that somehow performs better than American frontier models. MetaStone has released XBai o4, a 32B model that excels in complex reasoning capabilities. It outperforms o3-mini in medium mode across AIME24, AIME25, and LiveCode Bench. The model was trained on a single network to both generate reasoning steps and evaluate their quality, leading to faster and higher-quality outputs. It is available to download on Hugging Face.

Tools of the Trade

  1. Stagewise: The first frontend coding agent that you can show and tell what you want to change in your front-end. Just select webpage elements, comment what you want to change, and it takes the DOM context with screenshots for code modifications. It lives right inside your browser, makes changes in your local codebase, and is compatible with all kinds of frameworks.

  2. AutoRL: Train a custom model for any single-turn task using just a description - no labeled data required! Just describe what you want the model to learn in plain English, and AutoRL will automatically generate training inputs, create an appropriate system prompt, and train your model.

  3. CCFlare: Proxy server for Claude API that automatically distributes requests across multiple Anthropic accounts to bypass rate limits. Get detailed analytics, logging, and conversation sessions.

  4. AgentHub: Creates realistic simulation environments to test AI agents at scale before deployment. It provides evaluation, tracing, and grading for computer-use, browser, conversational, and tool-use agents across multi-step workflows.

  5. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. Google DeepMind is destined to win the AGI race. ~
    Ashutosh Shrivastava

  2. There are things that regular code will always do better than AI.


    In fact, most software will still rely on ol’ plain regular code, not LLMs.


    I remember when OOP became a thing. Everyone and their mother wanted to write classes instead of structured functions.


    It’s hard to grasp how much bloated codebases became because OOP purists didn’t want to do anything else.


    We are now living through the same, but instead of OOP we are now dealing with AI-first code (whatever that means).


    There will be a time when we will collectively look back and laugh at the stupidity of so many people using LLMs to do things that regular if/else/for/while statements can do 100000x better and more efficiently. ~
    Santiago

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.