• unwind ai
  • Posts
  • Long Running AI Agents with Zero Infrastructure

Long Running AI Agents with Zero Infrastructure

+ xAI releases Grok 4 Fast with 2M context window

Today’s top AI Highlights:

& so much more!

Read time: 3 mins

AI Tutorial

Learn OpenAI Agents SDK from zero to production-ready!

We have created a comprehensive crash course that takes you through 11 hands-on tutorials covering everything from basic agent creation to advanced multi-agent workflows using OpenAI Agents SDK.

What you'll learn and build:

  • Starter agents with structured outputs using Pydantic

  • Tool-integrated agents with custom functions and built-in capabilities

  • Multi-agent systems with handoffs and delegation

  • Production-ready agents with tracing, guardrails, and sessions

  • Voice agents with real-time conversation capabilities

Each tutorial includes working code, interactive web interfaces, and real-world examples.

The course covers the complete agent development lifecycle: orchestration, tool integration, memory management, and deployment strategies.

Everything is 100% open-source.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Your async code runs for hours without breaking, your AI agents never timeout mid-conversation, and your background jobs scale automatically while you sleep.

Trigger.dev is an open-source platform that lets you write normal async TypeScript code and deploy it as production-grade background jobs with zero infrastructure management. It is specifically designed to handle long-running async tasks with built-in queuing, automatic retries, and real-time monitoring.

Companies like Icon.com (AI video generation) and Scrapybara (real-time computer-use) rely on this platform because traditional serverless functions simply can't handle their compute-intensive AI workloads.

Key Highlights:

  1. Invincible Execution - Tasks automatically survive infrastructure failures through checkpoint-restore technology that preserves CPU and memory state across server migrations.

  2. Native AI Integration - Purpose-built for AI workloads with reliable API calling, automatic retries, stream forwarding, and support for frameworks like Mastra and LangChain.

  3. Advanced Observability - Get full trace views of every task run with advanced filtering, bulk actions, atomic versioning, and custom alerts via email, Slack, or webhooks.

  4. Flexibility - Deploy to their managed cloud or self-host with full control over build processes, container images, and runtime customizations, including Python scripts, Prisma, Puppeteer, and FFmpeg.

A frontier model that runs 344 tokens per second with a 2M context window, offering the best performance on the cost vs intelligence frontier.

xAI has shipped another reasoning model, Grok 4 Fast, that achieves state-of-the-art cost efficiency. The team distilled their learnings from Grok 4 into a dramatically more efficient architecture that uses 40% fewer tokens, costs up to 98% less to achieve the same performance on frontier benchmarks as Grok 4.

The model dynamically switches between modes using the same weights, controlled by system prompts. It excels particularly in agentic search capabilities, seamlessly browsing web and X content while ingesting multimedia and synthesizing findings in real-time.

Available free on all Grok platforms and temporarily on OpenRouter and Vercel AI Gateway, it's also accessible via xAI's API with competitive pricing.

Key Highlights:

  1. Search Dominance - Claims #1 position on LMArena's Search Arena with 1163 Elo, beating o3-search by 17 points, while ranking #8 in Text Arena among all models.

  2. Token Efficiency - Achieves 40% better token efficiency than Grok 4 through reinforcement learning optimization, resulting in 98% cost reduction for equivalent performance.

  3. Native Tool Integration - Trained end-to-end with tool-use RL for seamless web browsing, code execution, and real-time data synthesis across platforms, including X.

  4. Access - First time, all users, including free tier, get unrestricted access to xAI's latest frontier model across web, mobile, and API platforms. Rolled out in the API as two models: grok-4-fast-reasoning and grok-4-fast-non-reasoning.

Fact-based news without bias awaits. Make 1440 your choice today.

Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.

Quick Bites

ChatGPT has become more lifestyle assistant than a work tool
OpenAI did a one-of-its-kind study on how people are using ChatGPT. This research reveals that 70% of queries are now personal rather than work-related, with "Practical Guidance," "Seeking Information," and "Writing" dominating user behavior. Interestingly, coding represents just 4.2% of conversations despite all the developer hype. The real opportunity here lies in building AI systems that enhance human judgment across everyday decisions, not just professional tasks. Read this research paper not just as an insight but as a business opportunity to find precise niches and build AI applications that target exactly how people actually want to use AI.

Gemini can now see and work across your Chrome tabs
Google has integrated Gemini in the Chrome browser for desktop, bringing contextual AI assistance directly to your browsing experience. This integration allows Gemini to see all your open tabs, compare data, and help you find answers then and there. Gemini can also work with Google apps like Calendar and Maps, and can even watch YouTube videos. Soon, Gemini in Chrome will also be able to take actions across other websites, like booking a haircut or ordering your weekly groceries.


GPT-5 and Claude Opus 4.1 crash to 23% at this new SWE bench
That existential dread about AI replacing developers? This new bench suggests it's premature. Scale AI just dropped SWE-Bench Pro, a significantly tougher coding benchmark that evaluates LLMs on enterprise-grade problems with long-horizon tasks that may take hours to days for a professional SWE to complete. While GPT-5 and Claude Opus 4.1 cruised at 70%+ on the original SWE-Bench Verified, they're struggling to hit 23% on Pro's multi-file repos. The massive performance drop hints that models were likely recalling memorized solutions rather than true problem-solving.

How long do you want GPT-5 to think?
You can now set how much time GPT-5 spends thinking before replying in ChatGPT. With a new toggle, Plus, Pro, and Business users can now pick between faster replies or deeper reasoning, with Pro getting the widest range from Light to Heavy. Your choice sticks until you change it.

Tools of the Trade

  1. DeepContext MCP - Provides symbol-aware semantic search for coding agents via MCP. It uses both AST parsing and hybrid search with reranking to find semantically relevant code chunks rather than relying on text-based grep searches. Currently supports Typescript and Python.

  2. MCP Pointer - A Chrome extension and MCP server that enables you to Option+Click any DOM element in your browser to capture its complete context for AI agents. This allows it to analyze specific webpage elements you've visually selected rather than working with abstract descriptions.

  3. Cactus - An inference engine for running LLMs, VLMs, and TTS models locally on smartphones and low-end ARM devices. It achieves 16-70 tokens/sec on typical phones, depending on model size and device, faster than Llama.cpp.

  4. Vectroid - A serverless vector search solution that delivers exceptional accuracy and low latency in a cost-effective package. It splits system components (write, read, index) so they can scale separately, and stores data in layers with on-demand loading.

  5. Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
    (Now accepting GitHub sponsorships)

Hot Takes

  1. The AI cycle is like:

    GPT sucks, I'm switching to Claude

    Wow Claude sucks, I'm switching to GPT

    They're both so over, I'm switching to Gemini

    Folks, have a little patience and be glad that we have 3 great labs competing with each other. ~
    Peter Yang


  2. the AI agent hype fades really really quickly once you build your own

    even with SOTA models, memory, tool calling etc. it's so hard to get good results ~
    Klaas

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.