- unwind ai
- Posts
- Opus 4.8-level model now runs locally for FREE
Opus 4.8-level model now runs locally for FREE
+ OpenRouter Fusion, GLM-5.2 locally, Loop Engineering
Today’s top AI Highlights:
Google Cloud turns scattered knowledge into agent-readable files
Vibe is here: one agent for work and code
Run GLM 5.2 locally
Telegram bots can now talk to other bots
Open-source alternative to Loom, Granola, and Wisprflow
& a lot more!
Read time: 3 mins
AI Tutorial
The frontend used to be a fixed thing. Designers drew it. Engineers built it. Users got what shipped.
That's over.
The interfaces shipping in 2026 are drawn partly by the agent itself, in real time, from what the user actually asked for. Ask for a table, get a table. Not a paragraph describing one.
Generative UI is the layer that lets agents stop describing and start showing.
This guide walks you through three patterns that have emerged on how to build it, and the differences between them matter more than most teams realize.
Latest Developments
Every team building internal agents eventually hits the same wall: the model is smart, but the context is scattered everywhere.
Part of it lives in data catalogs. Part of it lives in wikis. Part of it lives in code comments, dashboards, tribal knowledge, and that one senior engineer's brain.
Google Cloud just introduced Open Knowledge Format (OKF) to make that context portable. It is a vendor-neutral spec that turns enterprise knowledge into Markdown files with YAML frontmatter, so agents can read it, search it, version it, and move it between tools without another custom integration.
The nice part is how boring the format is. Just Markdown. Just files. Just a small set of structured fields like type, title, description, resource, tags, and timestamp.
That is exactly why it could work. Agents do not need another complex metadata platform; they need context they can actually open, inspect, and use.
Key Highlights:
Markdown-first: OKF represents context as human-readable Markdown, so engineers can review it in normal editors and agents can index it without special tooling.
Structured enough for agents: YAML frontmatter adds queryable fields like type, title, resource, tags, and timestamp without turning the whole thing into a heavy schema project.
Portable by default: OKF bundles can live in Git, ship as files, mount on a filesystem, or move across tools without locking context inside one vendor's catalog.
Reference implementations included: Google shipped examples including BigQuery enrichment, a static HTML visualizer, sample bundles, and Knowledge Catalog ingestion support.
Meet Vibe by Mistral, one agent and one licence across work and code. Vibe takes on long-running, multi-step work: catching up across your inbox and calendar, running deep research, drafting deliverables, and taking coding work from request to merged change, across the web app, your editor, and your terminal.
Key highlights:
Work Mode for complex, multi-stage tasks: Maps out a plan, gets your sign-off, then works across your connectors to carry it through. Every tool call and reasoning step is visible and expandable as it runs.
Code Mode for remote coding sessions: Connect to GitHub, start sessions, and see them through to a pull request. Sessions run in an isolated sandbox, persist while your machine is off, and can run in parallel.
VS Code extension: Vibe now works across your whole project inside VS Code. Reads, edits, and runs commands in a side panel. Open files attach automatically, @ mentions pull in context from anywhere in your repo.
CLI updates: Skills become / commands. Permissions are session-scoped. /teleport moves a live session between your terminal and the cloud, history and approvals intact.
The most annoying part of using multiple models is that you usually have to become the router yourself.
You ask one model, compare it with another, maybe try a third, then manually decide which answer is right.
OpenRouter's new Fusion API turns that pattern into a single call. You send a prompt to Fusion, it dispatches the task to a panel of models in parallel, gives them web search and web fetch, then uses a judge model to compare the answers before producing the final response.
The results are worth paying attention to: Fable 5 + GPT-5.5 fused together scored 69.0% on DRACO, beating every individual model in OpenRouter's test, including Fable 5 alone at 65.3% and GPT-5.5 alone at 60.0%. That matters because Fable 5 is Anthropic's strongest model, and Fusion still found extra lift by pairing it with another frontier model instead of treating one model as the ceiling.
OpenRouter also tested a budget panel with Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro. That panel scored 64.7%, beating GPT-5.5 and Claude Opus 4.8 individually, coming within about one point of Fable 5 alone, and doing it at roughly half the cost.
The real story is not just the benchmark. It is that model diversity is becoming a product primitive. Instead of picking one model and hoping it is the right one, builders can start treating models like a small research team.
You can use it through the normal OpenRouter API. Just call openrouter/fusion directly or configure the Fusion plugin with your own analysis models and judge model.
Quick Bites
Everyone is talking about loop engineering, but Addy's version makes it usable
Addy Osmani's piece is useful because it turns the phrase into an actual operating model. The shift is from "I prompt the agent" to "I design the loop that finds work, hands it to agents, checks the result, records state, and decides the next step."
That is a better frame for where coding agents are going. The prompt is no longer the main artifact. The loop is. If you are building agent workflows, this gives you a cleaner way to think about retries, memory, evaluation, escalation, and token cost before you wire everything together.
Sakana Fugu explores orchestration as the model
Sakana's Fugu is interesting because it is less about launching another standalone model and more about coordinating multiple models into a stronger system. The bet is that intelligence can come from routing, combining, challenging, and arbitrating models, not only from scaling one model in isolation.
That makes it rhyme with the Fusion story, but from a research direction rather than an API product. The useful takeaway for builders is simple: the next frontier may be systems that know which model to use, when to ask for disagreement, and how to merge partial answers without making the user manage the whole process.
Telegram bots can now talk to other bots
Telegram's latest bot update lets bots respond to other bots, not just humans. That sounds small, but it changes what Telegram can be used for: not just a chat UI, but a lightweight coordination layer for agent workflows.
You could mention one bot, that bot could hand work to another bot, and the whole exchange stays visible in a normal chat thread.
Run GLM-5.2 locally with Unsloth
Unsloth just published a guide for running GLM-5.2 with Dynamic GGUFs, including 1-bit and 2-bit quant options, llama.cpp instructions, and Unsloth Studio support. The 2-bit build is still huge at around 239GB, but that is dramatically smaller than the full 1.51TB model.
This is not casual laptop territory, but it is meaningful for local-agent builders with serious memory available. A 744B-parameter open model with 40B active parameters and a 1M context window is already being squeezed into setups that advanced users can actually experiment with.
Tools of the Trade
Birdclaw: A local-first Twitter workspace that imports your X archive, syncs timeline/bookmarks/mentions, and stores everything in SQLite. The useful part is that your X memory becomes searchable and agent-readable: you can full-text search old likes and bookmarks, triage mentions with AI ranking, generate local digests, and keep a Git-friendly backup instead of losing everything inside the platform UI.
Agent-Native Clips: An open-source Loom + Granola + Wisprflow-style app for screen recordings, meeting notes, and dictation. Every clip gets transcripts, summaries, timestamped frames, and searchable history, so an agent can understand what happened in a video or meeting without needing raw audio/video ingestion.
Stripe Directory: A public-preview Stripe CLI directory for discovering businesses and services on the Stripe network. Developers and agents can search providers by keyword, then get structured results for Stripe Apps, Projects.dev providers, machine-payment endpoints, and business profiles instead of manually hunting across docs and marketplaces.
Awesome LLM Apps (113k+ 🌟 ) - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉





Reply