unwind ai
Posts
/goal in Claude Code, Codex, and Hermes Agent

/goal in Claude Code, Codex, and Hermes Agent

+ OpenClaw creator open-sourced a Mac automation tool

Shubham Saboo & Gargi Gupta
May 12, 2026

Today’s top AI Highlights:

& so much more!

Read time: 3 mins

AI Tutorial

Build a Multimodal Agentic RAG App with Gemini Embedding 2 and Google ADK

In this tutorial, you'll build a fully-working multimodal agentic RAG app where text, URLs, PDFs, images, audio, and video all share a single 768-dimension embedding space, and a small Google Agent Development Kit (ADK) coordinator turns the retrieved evidence into a grounded, cited answer.

The two pieces doing the heavy lifting are Gemini Embedding 2, which embeds every modality into the same vector space, and Google ADK, which wraps the retrieval call in an agent that inspects the workspace, calls the retrieval tool, and writes the answer.

You'll see exactly how those two pieces compose without any extra orchestration framework.

Build a Multimodal Agentic RAG App with Gemini Embedding 2 and Google ADK

(100% open source)

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

OpenClaw creator open-sourced a native Mac automation tool

Still using computer use agents doing screenshot → click → reason → hallucinate → repeat?

Peter Steinberger open-sourced Peekaboo, a MacOS-native toolkit that hands agents the accessibility tree directly, so clicks land on real elements with IDs, not pixel coordinates that drift every time the window moves.

Beyond click and type, it covers the stuff every other agent fails at, like Spaces switching, Dock right-clicks, menu bar extras, file dialogs, drag-to-Trash.

Use it with Claude Code, Codex, OpenClaw, Hermes Agent, or which agent harness you like, via CLI and MCP server. Bring any model like Claude, GPT-5.1, Grok 4-fast, or Ollama for fully local runs. MIT-licensed.

Key Highlights:

Native, not virtualized: Runs as a real macOS process with Screen Recording + Accessibility permissions. It can drive any app you can, including ones that block automation inside browsers or VMs.
Structured menu discovery: peekaboo menu returns the full menu tree as JSON, so agents navigate "File → Export → PDF…" by name instead of pattern-matching on screenshots.
Multi-screen and multi-Space aware: First-class support for moving windows between Spaces, switching desktops, and targeting elements on specific displays.
Drop into OpenClaw: Lives in the repo as skills/peekaboo-cli, so you can install as an OpenClaw skill alongside the 5,400+ others.

Voxtral TTS: Outperforms ElevenLabs on naturalness

When it comes to voice agents, naturalness is a key factor. Voxtral TTS outperforms ElevenLabs Flash v2.5 on naturalness and matches ElevenLabs v3 quality with emotion-steering support. Lightweight at 4B parameters, built for production.

Key Highlights:

Wins 58.3% of flagship voice preference tests: In side-by-side human evaluations against ElevenLabs Flash v2.5, Voxtral TTS wins on naturalness across flagship voices and 68.4% on voice customization.
Emotion-aware output: Contextual understanding (neutral, happy, sarcastic, and more) determines whether output sounds considered or robotic.
70ms model latency, ~9.7x real-time factor: Streams natively and integrates into any existing STT and LLM stack.
Voice cloning from 3 seconds of audio: Adapts to tone, personality, rhythm, and intonation. Zero-shot, no fine-tuning required.
Open weights under CC BY NC 4.0: Deploy on your own infrastructure, extend to your own voice library.

Try it now!

Run DeepSeek V4 Flash locally on a 128GB Mac

The creator of Redis just built a dedicated inference engine for a quasi-frontier model. It might be the most opinionated piece of AI infrastructure released this year.

Salvatore Sanfilippo (antirez) released DwarfStar4, a purpose-built C + Metal engine that runs DeepSeek V4 Flash, a 284B parameter open-source model with a 1M token context window, locally on a 128GB MacBook at ~27 tokens/second.

No generic runtime or framework. Just raw C + Metal doing one thing really well. The engine uses an asymmetric 2-bit quantization that fits the entire model in ~81GB, ships with a disk-based KV cache that's a lifesaver for agent workflows, and built-in APIs that plug straight into agents like OpenClaw, Hermes, Claude Code, Opencode, and Pi.

Key Highlights:

One Model, Maximum Optimization: DS4 isn't a generic model runner. It's a dedicated engine built exclusively for DeepSeek V4 Flash, squeezing out performance that general-purpose tools can't match.
Runs on a MacBook: A specialized 2-bit quantization compresses the 284B model to ~81GB while keeping code generation and tool calling quality intact, making it genuinely usable on 128GB Macs.
Disk KV Cache for Agents: Saves session state to your SSD so agent clients that resend large system prompts every request can skip the expensive prefill after the first run. That’s a massive time saver.
Agent-Ready Out of the Box: Ships with OpenAI and Anthropic-compatible server APIs plus ready-to-use configs for Claude Code, opencode, and Pi.

Quick Bites

OpenAI enters serious cyber defense with Daybreak
OpenAI just shipped Daybreak, a cyber defense stack built on GPT-5.5 and Codex Security.

iykyk

The idea is Codex ingests your repo, builds a threat model specific to your codebase, then maps attack paths and validates real vulnerabilities in sandboxed environments. It generates patches, runs them, and sends audit-ready evidence back into your existing security stack. Here’s how the access will work for now: standard GPT-5.5 stays general-purpose, Trusted Access unlocks for verified defenders doing vuln triage and malware analysis, and GPT-5.5-Cyber is for authorized red teaming, pen testing, and controlled validation.

Thinking Machines show what they’re building with $2B funding
Thinking Machines finally demoed what they’re working on: "interaction models." At first glance, it feels a lot like the GPT-4o demo from 2 years ago: real-time, audio-video-text. The interesting part is underneath though: a 276B MoE “interaction model” (12B active, 0.40s latency) that handles the live conversation, and a separate background model runs reasoning, searches, and tool calls mid-chat, then feeds results back in. Full-duplex isn't new (hi Moshi from Kyutai Labs), but the architectural design is interesting, and the early benchmarks on latency and quality are solid.

Claude Agents can now dream between sessions
Anthropic's Claude Managed Agents (their hosted agent runtime, launched last month) got a solid update: dreaming, outcomes, and multi-agent orchestration. Dreaming is the one worth paying attention to! It reviews past agent sessions between runs, surfaces recurring mistakes and workflow patterns, and folds them back into memory automatically. Outcomes lets you define a success rubric evaluated by a separate grader in its own context window, looping the agent back until output clears the bar. Multi-agent orchestration does what you'd expect — lead agent delegates to specialist subagents, each with their own model and tools, running in parallel.

What the Hermes Agent community is actually building
If you’re still wondering what people are using Hermes Agent for, here’s a wall of 200+ of them to inspire you! The team used Hermes Agent itself to scrape the entire internet for these usecases and added them to their docs. And if you've found an interesting use case, you can submit your own!

/goal in Codex CLI, Hermes Agent, and Claude Code
ICYMI and have been manually re-prompting your coding agents with "keep going," that's now a solved problem across the board. /goal gives the agent a durable objective with a clear done-condition. It keeps looping - planning, editing, running, and verifying until that condition is actually met or you tell it to stop. Codex CLI shipped it first, Hermes Agent picked it up in v0.13.0, and Claude Code now has its own native version. And here’s an interesting workflow we discovered: use Hermes Agent as an orchestration layer to fire /goal across Codex CLI and Claude Code simultaneously, and track all the running objectives on Hermes's Kanban board.

Tools of the Trade

Sangria: An open-source SDK that lets you put a paywall on any API endpoint so AI agents can pay per request automatically via the x402 protocol and USDC on Base. Drops into Express/Fastify/Hono/FastAPI with minimal code.
agent-harness-kit: An open-source TypeScript CLI that scaffolds a structured multi-agent workflow into any codebase. One command sets up multiple agents (lead, explorer, builder, reviewer), a SQLite task backlog, and a health gate that runs before any agent can start or close work. It's provider-agnostic (Claude Code, OpenCode) and ships its own local MCP server.
/orchestrate: Skill that decomposes large tasks into a tree of parallel cloud agents: planners, workers, and verifiers. It runs on the Cursor SDK's cloud runtime, so each agent gets an isolated VM, and the whole tree reconciles back through git and structured handoffs.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.