- unwind ai
- Posts
- Every Software Just Became Agent-Native
Every Software Just Became Agent-Native
+ Self-Evolving Agent Skills by Microsoft
Today’s top AI Highlights:
CLI-Anything: Every Software Just Became Agent-Native
Shared AI agents for teams in Slack
Microsoft SkillOpt: Train the Skill, not the model
Inference price war is here (and it’s starting from China)
LangChain gives agents a code layer between tool calls
& so much more!
Read time: 3 mins
AI Tutorial
HTTP is a primitive. JSON is a primitive. /goal is becoming one for coding agents.
A few weeks ago, OpenAI's Codex CLI added /goal as a way to give the coding worker a job with a defined done state. Claude Code added it this week.
Hermes Agent, the orchestrator I run on a Mac Mini to coordinate work between coding workers, has had /goal built in for a while.
This guide walks through what /goal actually is, the three roles in a multi-agent setup, a real end-to-end run, the verification rule, and how to run goals in parallel without workers stepping on each other.
Latest Developments
Your agent can write code, search the web, and manage files. But ask it to edit a Blender scene, export a MuseScore sheet, or automate Rekordbox, and it hits a wall. The software doesn't speak agent.
CLI-Anything from the HKUDS lab fixes this by generating full CLI harnesses for any software, turning GUI-only apps into agent-controllable tools. One command analyzes the target app's source code, architects a CLI, implements it with tests, and publishes it to PATH. The project ships with a growing registry of 50+ ready-made CLIs covering GIMP, Blender, LibreOffice, OBS, Obsidian, Kdenlive, QGIS, and more.
The idea is simple but the implications are huge: if CLI is the universal interface both humans and LLMs already speak, then wrapping every piece of software in a CLI makes the entire software ecosystem agent-accessible overnight.
Key Highlights:
CLI-Hub package manager: pip install cli-anything-hub, then browse, search, and install any harness with cli-hub install <name>. Supports pip, npm, brew, and system tools.
7-phase generation pipeline: Point it at a repo or app and it runs through analyze, design, implement, plan tests, write tests, document, and publish, fully automated by your coding agent.
Works with every major agent: Claude Code plugin, Pi extension, OpenCode commands, Codex, and OpenClaw skill. Each gets a native integration path.
Skills baked in: Every generated CLI ships with a SKILL.md so agents can discover and use it autonomously, no manual wiring needed.
Try it now: Install from PyPI or clone the repo. The CLI-Hub web registry is live at clianything.cc.
We love our personal Hermes and OpenClaws. But how many of us have been running it for our professional work, with our teams, cross-functionally?
It’s a completely different set of problems: surviving laptop closures, handling real credentials securely, running for hours or days, and being reachable where the team actually works.
Centaur is the self-hosted runtime Paradigm and Tempo have been running internally since January, now open-sourced under Apache 2.0. It's a Slack-native multiplayer agent system where:
every thread gets its own isolated Kubernetes sandbox,
tools you add are instantly available to every conversation, and
a credential firewall injects secrets in-flight so agents can never exfiltrate raw keys.
Tools are plain Python drop-ins that hot-reload across your org, workflows checkpoint to Postgres and resume exactly where they left off after a crash, and every night the system reviews its own performance and ships fixes to its own skills (super interesting!).
Clone from GitHub or visit centaur.run to get started.
What if you could train agent skills the same way you train neural networks, with learning rates, mini-batches, epochs, and momentum, but entirely in text space?
SkillOpt from Microsoft Research does exactly that. Instead of fine-tuning model weights, it treats SKILL.md as a trainable external parameter. The frozen target model executes tasks, records scored trajectories, and a separate optimizer model proposes structured edits to the skill. Edits are accepted only when held-out validation performance improves.
The whole thing mirrors a training loop: rollouts are forward passes, reflection is a backward pass, and a textual edit budget acts as a learning rate to prevent destructive rewrites.
Evaluated across 6 benchmarks and 7 models, including real agent execution loops with Codex and Claude Code, SkillOpt achieves best or tied-best results in all 52 settings tested.
Key Highlights:
Improvement with Claude Code: On GPT-5.5 target model running through Claude Code, SkillOpt skills boosted performance by an average of 18.6 points across benchmarks, with Spreadsheet tasks jumping +58.3%.
Cross-model and cross-harness transfer: A skill trained with Codex transfers directly into Claude Code and gains +31.8% on SpreadsheetBench. Trained on GPT-5.4, transfers to GPT-5.4-nano and still gains +15.2%.
Self-optimizer mode works: Even when the target model is its own optimizer, the constrained, validated update loop still discovers useful edits.
Exports a single file: The whole optimization produces one best_skill.md file. The target model at deployment never sees the optimizer memory, rejected edits, or training state.
Open-source: The whole thing is open-sourced under MIT license. Go and try it out!
Quick Bites
The inference price war is here
DeepSeek just made its 75% price cut on V4-Pro permanent. Xiaomi's MiMo slashed V2.5 pricing by up to 99%, effective today. But this isn't a loss-leader race to the bottom. V4-Pro's hybrid attention architecture compresses its KV cache at 1M tokens to 10% of V3.2's, with single-token inference FLOPs at 27% of previous. V4-Pro now sits at $0.87 per million output tokens. A year ago, sub-dollar output pricing meant you were using a small distilled model with real capability tradeoffs. These are frontier-class reasoners.
ElevenLabs Launches Music v2
ElevenLabs just shipped Music v2 with better vocals, instrumentation, and arrangement across every genre, plus improved multilingual support and capabilities that weren't possible before. If you've used their v1 for music generation, this is a huge upgrade.
Cohere Drops Command A+: 218B MoE Under Apache 2.0
Cohere just open-sourced Command A+, a 218B parameter MoE model with only 25B active per token. It unifies all previous Command A variants (reasoning, vision, translation) into a single model, supports 48 languages, and runs on as little as two H100s at W4A4 quantization. On τ²-Bench Telecom, it jumped from 37% to 85% over Command A Reasoning. Apache 2.0, available on Hugging Face in BF16, FP8, and W4A4.
Microsoft Open-Sources Fara1.5 Browser Agents
Microsoft Research just dropped Fara1.5, a family of three open computer use agent models (4B, 9B, 27B) built on Qwen3.5 for browser automation. The 27B variant hits 72% on Online-Mind2Web, outperforming OpenAI Operator, Gemini 2.5 Computer Use, and Yutori Navigator n1. Even the 9B model at 63.4% beats every proprietary competitor. Available on the Microsoft Foundry now.
OpenAI Launches Secure MCP Tunnels
Your private MCP servers can now stay inside your network while ChatGPT, Codex, and the Responses API connect through outbound-only HTTPS. No inbound ports, no public endpoints, no VPN. If you've been holding off on connecting internal tools to OpenAI products because of network security concerns, this removes that blocker.
LangChain Gives Agents a Code Layer Between Tool Calls
Your agent calls a tool, reads the result, reasons, calls the next tool, reads, reasons, repeat. Every step is a model round trip. LangChain's Deep Agents now ships with interpreters — small QuickJS runtimes where the agent writes code that coordinates multiple tool calls, keeps intermediate state in the runtime, and returns only what matters. Early testing showed up to 35% fewer tokens on some tasks. Available in both Python and TypeScript.
Tools of the Trade
Codegraph: Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and more. Agents query symbol relationships and call graphs instead of scanning files, averaging 35% cheaper and 70% fewer tool calls. 100% local, MIT licensed.
Bumblebee: Perplexity's open-source supply chain scanner for developer machines. A single Go binary that checks lockfiles, package metadata, extension manifests, and MCP configs against exposure catalogs. Apache 2.0.
Sieve: macOS app that scans your Claude Code, Cursor, Copilot, Windsurf, and Codex chat history for accidentally leaked API keys, tokens, and passwords. Ships with an MCP server so Claude can check for exposed secrets itself. $9.99.
Awesome LLM Apps (111k+ 🌟 ) - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉




Reply