- unwind ai
- Posts
- Async Claude Code on Web
Async Claude Code on Web
+ DeepSeek OCR delivers 97% accuracy @ 10x compression
Today’s top AI Highlights:
& so much more!
Read time: 3 mins
AI Tutorial
Imagine uploading a photo of your outdated kitchen and instantly getting a photorealistic rendering of what it could look like after renovation, complete with budget breakdowns, timelines, and contractor recommendations. That's exactly what we're building today.
In this tutorial, you'll create a sophisticated multi-agent home renovation planner using Google's Agent Development Kit (ADK) and Gemini 2.5 Flash Image (aka Nano Banana).
It analyzes photos of your current space, understands your style preferences from inspiration images, and generates stunning visualizations of your renovated room while keeping your budget in mind.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Claude Code doesn't live in your terminal anymore.
Anthropic just dropped a web asynchronous version of Claude Code in research preview for Pro and Max subscribers, that lets you spin up multiple coding tasks across different repos, all running in parallel on their cloud infrastructure.
Connect your GitHub repositories, tell Claude what needs fixing, and it handles the rest - writing code, running tests, and opening PRs while you do literally anything else. Each task gets its own isolated sandbox with real-time progress tracking, and you can jump in to redirect Claude if it's heading down the wrong path.
This works particularly well for clearing out bug backlogs, knocking out routine fixes, or handling backend changes where tests can validate everything. There's even an iOS version now, so you can delegate coding work from your phone.
Simon Willison has written a very nice blog on his experience with Claude Code web, and he points out something important: agents that need constant approval are painfully slow, but this runs in "YOLO mode" with proper sandboxing, making it actually useful.
Key Highlights:
Parallel Execution - Kick off multiple tasks simultaneously across different repos from one interface, each running independently with its own progress tracking and automatic PR creation.
Sandboxed Security - Every task runs isolated with filesystem and network restrictions, using a secure proxy for Git that only touches authorized repos without exposing your credentials.
Easy Handoff - The "teleport" feature copies your chat and edited files to your local CLI if you want to finish up on your machine.
Turns out Anthropic is leaning hard into containerization, and the results show. They’ve released /sandbox
command for Claude Code CLI that uses OS-level primitives to guard your local filesystem and network access. Define your boundaries once, which directories Claude can write to, which domains it can reach, and it stops interrupting you with permission prompts while staying locked within those limits.
Want to get the most out of ChatGPT?
ChatGPT is a superpower if you know how to use it correctly.
Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.
Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.
The big whale is back and this time they released something wild!
Your AI models are choking on long documents because processing text is computationally expensive. DeepSeek's solution? Turn the text into an image first, then let the AI read the compressed version. To prove this, they built a new OCR model as a proof-of-concept that actually works - you can stuff 1,000 words into roughly 100 visual tokens and get them back with 97% accuracy.
This isn't your typical OCR tool. DeepSeek-OCR is designed around a core insight: a picture containing text requires far fewer tokens to represent than the raw text itself. The team built a specialized vision encoder that processes high-resolution images (up to 1280x1280) while keeping the token count remarkably low. For comparison, while competing models need thousands of vision tokens per page, DeepSeek-OCR uses 256-800 tokens depending on the mode, and still beats them on accuracy benchmarks.
Key Highlights:
Compression efficiency - The model achieves 97% OCR accuracy at 10x compression (100 vision tokens for ~1,000 text tokens) and 60% accuracy at 20x compression, opening doors for aggressive context compression in future LLM applications.
Flexible document handling - It outputs both markdown with layout information (paragraph positions, bounding boxes) and clean text extraction, plus it can parse embedded charts, formulas, and geometry figures through a unified interface.
Production-ready speed - The system can process 200,000+ pages per day on a single A100-40G GPU, making it viable for generating training data at scale for both language and vision-language models.
Forgetting mechanism potential - The compression approach naturally mimics human memory decay: recent conversations stay high-resolution while older context gets progressively compressed into lower-resolution images, reducing token usage over time.
Open-source - DeepSeek released everything under the MIT license - model weights, training code, and the complete DeepEncoder implementation, so you can download it, run it locally, or adapt it for your specific document processing needs.
Quick Bites
Agentic coding is now free for everyone
Amp Code is now available in a free tier that trades your training data and shows you targeted ads in exchange for unlimited agentic coding with frontier models. The catch: you can't choose which model you get (they're sourcing discounted tokens across providers), and your code becomes training fodder. But if you're fine with that, you get unconstrained access to what's currently one of the sharpest coding agents around.
Fast agentic context retrieval models by Cognition AI/Windsurf
Cognition AI is rolling out SWE-grep and SWE-grep-mini, first-of-its-kind new agentic search models designed to make context retrieval blazingly fast. These models power Windsurf’s Fast Context subagent, which surfaces the right files in seconds, handling 8 parallel tool calls per turn and running 4 turns in under 3 seconds. That’s over 20x faster than typical embedding-based searches, without sacrificing accuracy. You can try it in Windsurf now or test it in their playground.
Run interactive commands like vim and git rebase within Gemini CLI
Google's Gemini CLI now supports interactive commands through pseudo-terminal integration, meaning you can run vim, top, or interactive git rebases directly within the CLI without jumping to a separate terminal. The upgrade uses PTY support to keep everything in Gemini's context, streaming terminal snapshots in real-time with full two-way communication for input and window resizing. Available now in v0.9.
Use Google Maps data on 250M+ places via Gemini API
Google just made its Maps data available through the Gemini API, letting developers ground AI responses in information from 250+ million places. You can use it solo or combine it with Search grounding for dual-mode context (Maps for place details, Search for temporal info like event schedules), and the API returns embeddable widgets complete with user reviews and photos.
Tools of the Trade
Agentic Context Engine - This is an open-source Python implementation of Stanford’s viral research paper on Agentic Context Engineering. It includes the Generator, Reflector, and Curator agents that execute tasks, reflect on what worked/failed, and curate a "playbook" of strategies. All from execution feedback - no training data needed.
Strawberry - A browser where you can create your AI companions that automate repetitive web tasks by operating directly in browser tabs. The agents run in parallel across multiple tabs, use your cookies for authentication, and request approval before taking sensitive actions.
HuggingChat Omni - Open-source chat interface that automatically selects from 115 AI models across 15 providers based on your query type. It uses a policy-based routing system (powered by katanemo/Arch-Router-1.5B) to match requests with appropriate models, though you can also manually choose specific models.
MineContext - Open-source desktop app that continuously captures screenshots and other digital context, then uses AI to automatically surface daily summaries, todos, insights, and activity logs. It lets you query your accumulated context, stores everything locally, and supports multiple LLM providers, including local models. It’s like context-aware ChatGPT Pulse locally.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
Hot Takes
It's hard to fathom how much innovation has occurred in image/video generation over the last year.
Yet somehow, there's been almost zero progress on AI background removal
~ Theo - t3.ggIf Grok 5 turns out to be better at AI engineering than Andrej Karpathy, I’ll call it — that’s AGI.
Zuck doesn’t need billions to hire an AI researcher anymore.
Andrej mentioned his repos (like nanochat) were entirely handwritten, and claude/codex agents were net unhelpful. Current LLMs struggle with out-of-distribution code, the code that isn’t boilerplate or common patterns.
The code written by people like Andrej, Linus, or Geohot is out-of-distribution. No LLMs today can match their ability to craft large, coherent, solid systems in a unique personal style.
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply