unwind ai
Posts
OpenAI Hack to Cut Cost by 33%

OpenAI Hack to Cut Cost by 33%

PLUS: Google's opensource on-device multimodal model, Rent an autonomous Linux computer

Shubham Saboo & Gargi Gupta
June 27, 2025

Today’s top AI Highlights:

Make OpenAI transcriptions faster and cheaper. Just speed up your audio!
Rent an autonomous Linux computer that thinks and works for you
This AI video agent replaced your scriptwriter, editor, and director
Google cooked again - Natively multimodal model running entirely on-device
Shared memory you can carry across ChatGPT, Claude, Perplexity, Gemini & more

& so much more!

Read time: 3 mins

AI Tutorial

We've been stuck in text-based AI interfaces for too long. Sure, they work, but they're not the most natural way humans communicate. Now, with OpenAI's new Agents SDK and their recent text-to-speech models, we can build voice applications without drowning in complexity or code.

In this tutorial, we'll build a Multi-agent Voice RAG system that speaks its answers aloud. We'll create a multi-agent workflow where specialized AI agents handle different parts of the process - one agent focuses on processing documentation content, another optimizes responses for natural speech, and OpenAI's text-to-speech model delivers the answer in a human-like voice.

Our RAG app uses OpenAI Agents SDK to create and orchestrate these agents that handle different stages of the workflow.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Voice RAG Agent

Fully functional agentic RAG voice app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

OpenAI Charges by the Minute, So Make the Minutes Shorter 🎤⏳

Your OpenAI transcription bills are about to get a lot lighter with this ridiculously simple trick.

A developer just discovered that speeding up audio files 2-3x before sending them to OpenAI's API cuts costs by up to 33% while maintaining nearly identical transcription quality. The technique works because OpenAI charges based on audio duration (for whisper-1) or audio tokens (for gpt-4o-transcribe), so faster playback means fewer billable units.

George Mandis stumbled upon this method while trying to transcribe a 40-minute Andrej Karpathy talk that exceeded the 25-minute limit for the newer gpt-4o-transcribe model.

Key Highlights:

Significant cost savings - Speeding up audio to 3x reduces input token costs by 33% compared to 2x speed, with a 40-minute file costing $0.07 at 3x versus $0.09 at 2x speed.
Quality preservation - Both 2x and 3x speeds produced identical output token counts (2,048 tokens), suggesting the AI maintains the same level of comprehension and summarization capability.
Technical implementation - The process uses a simple ffmpeg command to adjust audio tempo while reducing bitrate and converting to mono, making it accessible for any developer.
Sweet spot limits - Testing showed 4x speed produces unusable transcriptions with bizarre repetitions, making 2x-3x the optimal range for balancing cost savings with accuracy.

Stop Asking AI Questions, and Start Building Personal AI Software.

Transform your AI skills in just 5 days through this free email course. Whatever your starting point, by Day 5 you'll be building working software without writing code.

Each day delivers actionable techniques and real-world examples straight to your inbox. No technical skills required, just knowledge you can apply immediately.

Rent a Fully-Managed Autonomous Computer on the Cloud 🖥️ ☁️

Computers are learning to use themselves, and you can rent one right now on the cloud.

Instead of wrestling with installations and setups, you can now rent a fully autonomous Linux machine that thinks, clicks, and codes for you.

Simular, the team behind the opensource Agent S2 computer-use framework that beat OpenAI's Operator, has launched Simular Cloud, a fully-managed service that puts their state-of-the-art AI agent directly into a cloud computer you can access instantly. No setup, no code, just ask and watch as Agent S installs Minesweeper on Linux, plays poker games, generates Mandelbrot visualizations, or scrapes Zillow data into spreadsheets.

Key Highlights:

Zero Setup, Maximum Power - Simular Cloud provides a fully managed Linux computer controlled entirely by Agent S; no installation headaches that come with the opensource version. You get instant access to the same AI that outperformed major tech giants on desktop automation benchmarks.
Beyond Basic Automation - Agent S handles complex workflows like installing software, playing interactive games with strategic decision-making, writing Python programs for data visualization, and performing system maintenance tasks like cleaning redundant files.
Real Computer, Real Results - Unlike browser-only automation tools, Simular Cloud gives you access to a full desktop environment where Agent S can navigate web pages, extract data into LibreOffice Calc, manage files, and run any Linux software you need.
Flexible Pricing - The service offers a free tier with shared computer access (10-minute sessions), a $49.90/month premium plan with dedicated computers and faster agents, and a $499/month pro tier with full privacy and always-on availability.

Quick Bites

Video creation just collapsed from a week-long process into a minutes-long conversation with AI. HeyGen has released Video Agent, a complete creative operating system that takes uploaded footage, documents, or even basic prompts and outputs an entire campaign-ready video.
One upload, one prompt, and minutes later, you have broadcast-quality content. This isn't AI-assisted editing but rather an AI creative team bundled together that handles directing, scripting, casting, and editing autonomously. You can join the waitlist.

MCP servers are powerful but painful to set up, until now. You can now install local MCP servers with one click on Claude Desktop using this new Desktop Extensions format. The .dxt files bundle everything - server code, dependencies, configuration - into one installable package that works across all Claude Desktop plan types. Anthropic has opensourced this .dxt file format to use it for your own MCP clients as well as contribute to making it work better for your use case.

While image generation went opensource, editing stayed locked behind APIs. Black Forest Labs just changed this by releasing FLUX.1 Kontext [dev], their developer version of FLUX.1 Kontext [pro], which delivers proprietary-level image editing performance in a 12B parameter model that you can run on consumer hardware.
Available as an open-weight model under a non-commercial license, it supports popular inference frameworks like ComfyUI, HuggingFace Diffusers, and TensorRT.

Google just dropped Gemma 3n models, mobile-first multimodal AI handles text, images, audio, and video natively on edge devices. The models come with a memory efficiency hack that makes 5B and 8B parameter models run like 2B and 4B models. That’s not just it: the E4B (raw 8B) model is the first sub-10B parameter model to crack 1300 on LMArena.

You can try the models on Google AI Studio and deploy directly to Cloud Run from AI Studio. The model weights are available to download on Hugging Face and Kaggle. You can also run them locally via Google AI Edge Gallery, LM Studio, Ollama, etc.

Tools of the Trade

OpenMemory: Chrome extension that keeps your memory and context consistent across ChatGPT, Claude, Perplexity, Grok, etc., so each one picks up right where the last left off. It detects and saves key and relevant information from your chats automatically, which you can edit, add, or delete in an intuitive dashboard.
Stock Market MCP Server: MCP server for AI agents to access real-time and historical financial market data. 100% remote and supports SSE. It offers tools for retrieving stock prices, financial statements, company information, and SEC filings.
CodeRunner: MCP server that executes AI-generated code in a sandboxed environment on your Mac using Apple's native Containers. It runs locally on your machine to analyze, transform, or process your files.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

LLMs will achieve AGI by making humans so stupid that surpassing us is trivial ~
Daniel
Whoever owns the AI first browser will win AI memory long-term. The browser is the closest approximation of humanity's memory that we have. ~
Suhail

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.