- unwind ai
- Posts
- Chinese AI Model Outperforms Claude 4 Opus and o4-mini
Chinese AI Model Outperforms Claude 4 Opus and o4-mini
PLUS: Microsoft tries Perplexity Comet-style browser, Visual programming in your codebase
Today’s top AI Highlights:
China’s opensource model outperforms Claude 4 Opus and o4-mini
Visual programming like n8n, but in your codebase
The 100-line AI agent that solves GitHub issues and more
Everyone's building agentic AI browser, Microsoft tried too
Route Claude Code requests to different models
& so much more!
Read time: 3 mins
AI Tutorial
Integrating travel services as a developer often means wrestling with a patchwork of inconsistent APIs. Each API - whether for maps, weather, bookings, or calendars - brings its own implementations, auth, and maintenance burdens. The travel industry's fragmented tech landscape creates unnecessary complexity that distracts from building great user experiences.
In this tutorial, we’ll build a multi-agent AI travel planner using MCP servers as universal connectors. By using MCP as a standardized layer, we can focus on creating intelligent agent behaviors rather than API-specific quirks. Our application will orchestrate specialized AI agents that handle different aspects of travel planning while using external services via MCP.
We'll use the Agno framework to create a team of specialized AI agents that collaborate to create comprehensive travel plans, with each agent handling a specific aspect of travel planning - maps, weather, accommodations, and calendar events.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Opensource models from China are overshadowing closed models from Silicon Valley, while OpenAI's open model is delayed, and Meta has shelved Llama 3 Behemoth.
Now, China’s Z.ai (formerly Zhipu AI) is throwing down the gauntlet with GLM-4.5, a model that doesn't just compete with the big players - it unifies reasoning, coding, and agentic capabilities into one powerhouse.
Built with 355B total parameters and 32B active parameters using a mixture-of-experts (MoE) architecture, GLM-4.5 joins the growing trend of Chinese models adopting MoE designs. The model operates in both "thinking" mode for complex reasoning and "non-thinking" mode for instant responses. What makes this particularly compelling is the pricing, significantly cheaper than comparable models while delivering performance that ranks third globally across agentic, reasoning, and coding benchmarks. The model matches Claude 4 Sonnet's performance on function calling and outperforms Claude 4 Opus on web browsing tasks.
Key Highlights:
API Costs - At $0.60/$2.20 per million tokens, GLM-4.5 delivers third-place global performance across benchmarks while undercutting major competitors on cost, making advanced AI capabilities more accessible.
Agentic Performance - The model matches Claude 4 Sonnet on function calling benchmarks and achieves 26.4% accuracy on BrowseComp web browsing tasks, outperforming Claude 4 Opus (18.8%) with superior tool-use capabilities.
Full-Stack Capabilities - The model excels at autonomous web browsing, complete application development from frontend to backend, artifact generation, and presentation creation with 90.6% tool calling success rate.
Availability - Accessible via Z.ai chat interface and OpenAI-compatible API, supports local hosting through HuggingFace/ModelScope, and integrates directly with Claude Code for enhanced development workflows.
Visual flow builders like n8n are great until you need them to actually work with your existing codebase.
Flyde brings the visual flow-based programming directly into your TypeScript codebase, running inside VS Code and integrating seamlessly with your existing functions and frameworks.
This opensource toolkit lets you build complex backend logic, AI agents, and prompt chains through visual flows while maintaining full access to your runtime code. Unlike standalone automation platforms that operate in isolation, Flyde works as a library within your project, giving you the collaborative benefits of visual programming without sacrificing the power and control of traditional coding. Built specifically for modern AI-heavy workflows, Flyde integrates smoothly with AI coding tools like Cursor and Windsurf, enhancing rather than replacing your development process.
Key Highlights:
In-codebase integration - Runs directly within your TypeScript/JavaScript projects as a library, providing access to runtime code and existing frameworks while maintaining your current CI/CD pipelines.
Visual AI workflow development - Build and iterate on AI agents, prompt chains, and complex backend logic through an intuitive visual interface without losing code-level control.
Cross-team collaboration - Enables non-developers like product managers and designers to understand and contribute to backend flows while keeping everything in a familiar development environment.
AI coding tool compatibility - Works seamlessly with modern AI development tools like Cursor, Windsurf, and Claude Code, augmenting your existing workflow rather than disrupting it.
What if the most effective coding agent could fit in just 100 lines of Python?
The Princeton team behind SWE-bench and SWE-agent just released mini SWE Agent, proving that sometimes less really is more. This radically simplified agent ditches fancy tools, complex configurations, and bloated dependencies while still achieving an impressive 65% success rate on SWE-bench verified with Claude Sonnet 4.
Mini focuses entirely on letting the language model utilize bash to its full potential. The agent maintains a completely linear history where every step just appends to the message chain, making debugging and fine-tuning straightforward. Built specifically for researchers doing benchmarking and fine-tuning work, but practical enough that developers are already using it as their go-to command-line coding assistant.
Key Highlights:
Minimal Architecture - Just 100 lines of core Python code with zero special tools beyond bash, making it compatible with any language model and trivial to deploy in sandboxed environments.
Linear History Design - Every agent step simply appends to the message chain, creating perfect alignment between trajectories and LM prompts for seamless debugging and fine-tuning workflows.
Stateless Execution - Each command runs as an independent subprocess rather than maintaining persistent shell sessions, dramatically improving stability and enabling seamless sandbox deployment.
Multi-Interface Support - Ships with both a Claude-code style terminal interface for daily use and a high-performance batch mode for running large-scale evaluations.
Quick Bites
Claude Code’s getting new weekly limits starting August 28. Anthropic is adding usage caps to prevent abuse like 24/7 background runs and account sharing - issues that have been straining system performance. Most Max 5x users can expect 140-280 hours of Sonnet 4 and 15-35 hours of Opus 4 within their weekly rate limits. Heavy Opus users with large codebases or those running multiple Claude Code instances in parallel will hit their limits sooner. The 5-hour rolling limit stays, but new weekly caps kick in for both Sonnet 4 and Opus 4.
Microsoft has released Copilot Mode in the Edge browser as everyone wants an AI-native browser. It sits somewhere between Gemini in Chrome and Perplexity Comet-like browser built for AI from the ground up. Copilot Mode adds a smart layer to Edge: it reads context across open tabs, answers questions with the context, and can also take actions in your browser, like restaurant reservations or even book a table once you grant access to history and saved details. The feature is free for now, entirely opt‑in, and you can flip it off any time.
Google just dropped Opal, a tool that turns your AI prompts into mini apps - no code, just natural language and a visual editor. You can chain together prompts, model calls, and tools into full workflows, tweak steps on the fly, and share your creations instantly. It’s perfect for testing ideas, building tools for work, or just exploring AI capabilities. Once ready, your app can be shared instantly. Opal is now in public beta in the US.
Tools of the Trade
Claude Code Router: Routes Claude Code requests to different models based on task, using cheaper models for routing and tool invocation and larger models for coding and reasoning. Great to reduce Anthropic API cost while maintaining Claude Code's interface.
Tinyio: A ~200-line Python event loop that replaces asyncio's complex error handling with a crash-everything approach when exceptions occur.
Dyad: A free, local, opensource alternative to Lovable, v0, Bolt, and Replit to build full-stack applications with any AI model using your own API keys. Build apps with auth, database, and server functions.
Aiko: A unified control center for MCP server configurations to manage API credentials, selectively enable tools, and switch between different environments through profile management.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
You can't outlearn a system that keeps evolving while you sleep.
We're not in the race anymore. ~
Ashutosh ShrivastavaI'm impressed by the quality of AI tools we already have available to us just 5 years after the release of GPT-3
But unfortunately, we're still in that phase where the TOOLS are great & our capacity to effectively USE them isn't there yet
100s of millions of ChatGPT users will look at "agent" and think to themselves, "what do i do with this?" We're still in that sort of early Internet phase where the possibilities are endless -- our imaginations, though, are pretty limited. We have to be shown what they can do. We have to be prompted to think bigger, come up with more interesting use cases. The basics just won't inspire people the way we might hope. ~
sporadicalia
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply