- unwind ai
- Posts
- LLM with 12M Context Window
LLM with 12M Context Window
+ Free web Search and Fetch for your Hermes and Claws
Today’s top AI Highlights:
& so much more!
Read time: 3 mins
AI Tutorial
Your agent has a 200k token context window.
The 400 tokens of instructions it actually needs are buried under tool definitions, reference docs, and brand guides it never asked for. So it ignores them.
This is the most common reason agents fail in production. It's not a model problem or a framework problem.
In this blog, you'll learn the anatomy of Agent Skills: why the first two lines of SKILL.md are the most important writing you'll do, and how the LLM itself routes queries to the right skill without embeddings or retrieval layers.
Read on to learn the five parts that make skills work, then pick one workflow you do every week and ship your first skill today.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
12 million tokens of context in a single pass, and it does it at roughly 1/1000th the attention compute of current frontier models.
Meet SubQ, the first large language model built on a fully subquadratic sparse attention (SSA) architecture, where compute scales linearly with context length instead of quadratically.
Transformers compare every token to every other token, which means doubling input length quadruples the compute. SubQ's architecture focuses only on the token relationships that actually matter, making million-token workloads fast and cheap enough to be practical.
The model is entering private beta today with an API, a coding agent called SubQ Code, and a long-context search tool called SubQ Search.
Key Highlights:
Benchmark Performance: SubQ 1M-Preview scores 95% on RULER 128K and 81.8 on SWE-Bench Verified, putting it on par with or ahead of Opus 4.6 and Deepseek V4 Pro on both long-context accuracy and code tasks.
Speed & Efficiency: Its sparse attention runs 52x faster than FlashAttention at 1M tokens while requiring 63% less compute.
5% Cost of Opus 4.7: Though the pricing is not out, the team claims it costs <5% Opus's cost at scale, with RULER 128K running for $8 vs ~$2,600. Take this with a grain of salt for now!
Private Beta: All three products (API, Code, Search) are available via early access at subq.ai.
TinyFish just made their Web Search and Fetch endpoints free with generous rate limits. Forever. No credit card or “7-day” trial”. Just sign up and grab your API key.
Search returns structured JSON for agents. Fetch renders any URL in a real browser with full JavaScript, SPAs, anti-bot, all of it, strips the unnecessary content, and returns clean markdown.
Everything runs on TinyFish's own custom Chromium fleet. Owning the stack end-to-end makes their Search and Fetch both free and fast.
Works with Claude Code, OpenClaw, Hermes Agent, Cursor, Codex, and any agent framework
Available via API, MCP, Python + TypeScript SDKs, CLI, and Skills
One API key. No credit card.
Your company's best knowledge is rotting in Slack threads nobody will ever search again.
And no, RAG is not the solution. The index is always stale. The chunks land at the wrong boundaries.
Turns out, coding agents already cracked this. They don't search, they grep
Scout, an open-source context agent from Agno, borrows the trick and navigates your information sources live. It connects to Slack, Google Drive, Linear, MCP servers, and more, walking each source's native API at query time to assemble real answers with real citations.
As it works, it builds its own wiki and CRM. Say "Josh from Anthropic shared a paper on RLMs" and Scout files Josh as a contact, parses the paper into a wiki page, and links them together.
The whole thing is open-source, ready to fork and customize.
Key Highlights:
Context Providers: Instead of exposing dozens of API-specific tools to the main agent, Scout wraps each source behind a thin sub-agent layer. The main agent sees
query_slack, not Slack's twelve endpoints, keeping context clean.Navigation over search: Scout queries live APIs at request time, so a Slack message sent thirty seconds ago is immediately available, and citations always point to real, openable paths.
Self-building CRM and wiki: It populates a Postgres-backed CRM and a knowledge wiki as it learns. It even creates new database tables on demand.
Ready to clone and use: Ships with Docker Compose, connects to Agno's AgentOS for multi-user sessions and scheduled tasks, and plugs into Slack with full thread history. More connectors coming soon!
Quick Bites
Codex gets a Tamagotchi
OpenAI shipped pets for Codex. These are animated companions that double as a persistent status overlay for your coding agent. The pet visually maps to whether Codex is actively working, waiting for input, or flagging something for review. Think of it as the most adorable process monitor you never asked for.
Just another day of the Claude team shipping
Anthropic just dropped ten agent templates built specifically for finance work, like pitchbooks, KYC screening, month-end close, valuation checks, the whole grind. They plug into Cowork and Claude Code or run autonomously as Managed Agents, and they come wired to data sources like Moody's, Third Bridge, and S&P Capital IQ. Oh, and Claude now works inside Excel, PowerPoint, Word, and Outlook with context that follows you across apps — so yes, your comps model can become a deck without explaining everything twice. Install them as plugins in Cowork and Claude Code
Make your Hermes Agent your video editor with one Skill
Hermes Agents can now spin up full videos, courtesy HyperFrames Agent Skill by HeyGen. Just do $ hermes skills install hyperframes, and your agent becomes a video editor that treats HTML as the source of truth for video. Feed it an X post, a PDF, or a GitHub repo, and it'll script, animate with GSAP, lay captions over TTS narration, and render a finished MP4, all orchestrated end-to-end by the agent itself.
Tools of the Trade
Designlang: Extract any website's complete design system with one command. It reads the design system off the live DOM, and emits 17+ files — DTCG tokens, Tailwind config, shadcn theme, Figma variables, motion tokens, typed component anatomy, brand voice, page-intent labels, and a paste-ready prompt pack for v0 / Lovable / Cursor / Claude Artifacts.
Interact AI: Replaces your static website with an adaptive, conversational interface that recomposes itself per visitor in real time. A founder sees compliance content, a CISO sees security controls, all generated on the fly from your data. It's not a chatbot widget in the corner; the conversation is the page, and everything the visitor says carries through into signup and product onboarding.
fireworks-tech-graph: A skill that turns plain descriptions of your system into polished SVG + PNG technical diagrams. It ships with 5 visual styles, 8 diagram types, and built-in knowledge of AI/agent patterns like RAG pipelines, Mem0 memory layers, and multi-agent flows.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉





Reply