- unwind ai
- Posts
- Build and Run Custom AI Coding Agents in Terminal
Build and Run Custom AI Coding Agents in Terminal
PLUS: Connect local LLMs to MCP servers, Vision-first AI browser agent
Today’s top AI Highlights:
Build and run custom AI coding agents in the Terminal
Stop drawing boxes around buttons - this agent sees screens like humans do
Connect local LLMs to MCP servers right on your computer
Report idling vehicles in NYC with AI and get a cut of the fines
& so much more!
Read time: 3 mins
AI Blog
Last month, we at Unwind AI wrapped up the Global Open Source AI Agent Hackathon, a focused build sprint for serious developers. No gimmicks, no wrappers. Just useful agent workflows, RAG systems, and tool-use workflows built with frameworks like Agno, Firecrawl, and Browser Use. With $25K+ in prizes and dozens of submissions, we saw a lot of interesting takes.
Now that the dust has settled, we’re featuring 6 standout projects that made it to our Awesome LLM Apps repo. We looked for working demos, solid logic, useful outputs, and clean implementation. Here are our picks:
🌐 Likeminds - Agentic Multi-Social Semantic Network
🎤 AI Speech Trainer: Multimodal Public Speaking Coach
📁 Windows-Use: GUI-Level Automation for Windows OS
🧩 Beifong: Personalized Information & Podcast Generator
🎥 TubeWarden: AI That Curates Your YouTube
✈️ TripCraft AI: Travel Planning That Just Works
100% Opensource code with demos.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Browser agents that draw numbered boxes around page elements are about as reliable as a paper umbrella in a thunderstorm. These DOM-dependent automations break the moment a website changes its layout.
Then there are other agents that follow "high-level prompt + tools = work until done," looking great in demos, but fall apart when you need actual production reliability.
Magnitude takes a vision-first approach that lets AI agents see and interact with web pages exactly like humans do - by looking at the screen and clicking where it makes sense.
Built on Playwright, this opensource framework promises to end the endless cycle of broken automations every time a website changes its layout. Whether you're extracting data, running tests, or building your own browser agents, Magnitude focuses on repeatability and control rather than flashy demos that fall apart in production.
Key Highlights:
Vision-first - Unlike traditional agents that rely on numbered boxes around elements, Magnitude uses visually grounded LLMs to specify precise pixel coordinates, making automations resilient to website changes and future-proof for desktop apps.
Flexible abstraction levels - Supports both high-level tasks like "log in to the app" and granular actions like "click the submit button," with custom prompting and data injection for real-world workflows.
Built-in test runner - Includes deterministic runs via native caching (in progress) and powerful visual assertions, designed specifically for CI/CD integration rather than just flashy demonstrations.
Playwright with AI-native - Maintains access to full browser context and page objects for cookie manipulation, network mocking, and low-level operations, while keeping natural language simplicity.
Compatible LLMs - Magnitude requires an LLM that is both very good at instruction following and planning, and understands precise coordinates in an image to interact with the browser accurately. Very few LLMs meet these criteria, like Claude 4 Sonnet and Qwen 2.5 VL 72B.
Create How-to Videos in Seconds with AI
Stop wasting time on repetitive explanations. Guidde’s AI creates stunning video guides in seconds—11x faster.
Turn boring docs into visual masterpieces
Save hours with AI-powered automation
Share or embed your guide anywhere
How it works: Click capture on the browser extension, and Guidde auto-generates step-by-step video guides with visuals, voiceover, and a call to action.
This is officially the year CLI coding agents take over software development.
OpenAI's Codex, Google's Gemini CLI, and Anthropic's Claude Code are proving that the Terminal could be the perfect home for AI agents. They can generate code, manage large codebases, manage PRs, resolve issues, and more.
How about taking it a step further and automating your entire software development lifecycle?
Agentic AI platform Qodo just released Qodo Gen CLI, which lets you build the exact agents you need rather than settling for what someone else thinks you want. It treats agents as programmable, configurable automation that you can customize for your specific workflows, trigger conditions, and business logic. And it’s truly model-flexible - use leading LLMs like GPT or Claude in the same interface.
Key Highlights:
Workflow-wide automation - Qodo Gen CLI lets you deploy agents anywhere in your SDLC with multiple execution modes, including CI mode for build pipelines, webhook mode for external triggers, and MCP mode for orchestration frameworks. Agents can automate everything from code reviews to release documentation.
Custom agents - Each agent uses a simple TOML configuration defining triggers, inputs, actions, and results. You can version, share, and integrate these configs into existing workflows just like any other development artifact.
IDE-agnostic - Qodo Gen CLI turns any IDE into an agentic environment by running through the built-in terminal. Whether you're using VS Code, IntelliJ, Vim, or anything else, you get access to the same workflows without changing your environment.
Model flexibility - Supporting all leading LLMs, including Claude, GPT, Gemini, DeepSeek, and Llama models, the platform offers SaaS, single-tenant, multi-tenant, and upcoming on-premise deployment options.
Quick Bites
Local LLM enthusiasts can finally tap into the broader MCP ecosystem via LM Studio. You can now connect your locally running LLMs to MCP servers from services like Stripe, GitHub, or Notion, plus it introduces prudent safety measures with mandatory tool call confirmations.
While Silicon Valley focuses on scaling up, Chinese researchers are mastering the art of selective activation. Tencent just dropped an MoE model that punches way above its weight class, using only 13B active parameters from its 80B total to match o1 and DeepSeek on key benchmarks. Hunyuan-A13B features hybrid "fast and slow" reasoning modes, native 256K context support, and particularly strong agent capabilities, outperforming o1 on multiple evaluations.
The model is available open-source with quantized versions and deployment support for TensorRT-LLM, vLLM, and SGLang.
Tools of the Trade
Branching: AI coding agents now code in parallel, but syncing their changes is a mess with regular Git. Branching fixes this by auto-syncing edits across agents and editors (Cursor, VS Code, Lovable, etc.) in real time, proposing merges when changes overlap. It writes clean Git commits to GitHub without you touching Git commands.
Gitprobe: Converts GitHub repositories into structured summaries that show code architecture and function relationships via call graph visualization. You access it by replacing 'hub' with 'probe' in any GitHub URL to get LLM-ready analysis of the codebase.
Idle Reporter: Report idling vehicles in NYC with AI and get a cut of the fines. The app records timestamped videos of idling commercial vehicles, then uses AI to automatically extract license plate numbers, addresses, and other required information to fill out complaint forms. You can submit completed reports directly through the app.
Merlin: AI chief of staff that integrates with Gmail, Google Calendar, and Slack to automatically prioritize your most important emails, meetings, and tasks based on impact and urgency. It can execute actions like drafting replies, scheduling meetings, and completing tasks with a single click.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
There’s a deep malaise in tech right now.
—New grads can’t find jobs
—BigTech middle managers are trying to justify their existence
—Everyone not in AI wants to be in AI
—Founders struggling with their startup for years see Roy rewrite the rules
—Comp insecurity is at an all time high with the Meta offers (“why am I working so hard?”)
Tech, net of AI, is just not as sexy a job it used to be 10yrs ago.
Tech, net of AI, is not the sexy job it used to be. ~
Deedy DasAll the technical language around AI obscures the fact that there are two paths to being good with AI:
1) Deeply understanding LLMs
2) Deeply understanding how you give people instructions & information they can act on.
LLMs aren’t people but they operate enough like it to work ~
Ethan Mollick
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply