- unwind ai
- Posts
- Kimi K2 Goes Full Agent Mode
Kimi K2 Goes Full Agent Mode
+ Meta's open-weight coding model, Perplexity Search API
Today’s top AI Highlights:
& so much more!
Read time: 3 mins
AI Tutorial
Learn OpenAI Agents SDK from zero to production-ready!
We have created a comprehensive crash course that takes you through 11 hands-on tutorials covering everything from basic agent creation to advanced multi-agent workflows using OpenAI Agents SDK.
What you'll learn and build:
Starter agents with structured outputs using Pydantic
Tool-integrated agents with custom functions and built-in capabilities
Multi-agent systems with handoffs and delegation
Production-ready agents with tracing, guardrails, and sessions
Voice agents with real-time conversation capabilities
Each tutorial includes working code, interactive web interfaces, and real-world examples.
The course covers the complete agent development lifecycle: orchestration, tool integration, memory management, and deployment strategies.
Everything is 100% open-source.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Building complete products used to require entire teams and months of work - now it takes one prompt and minutes.
China’s Moonshot AI just released OK Computer, an agent mode that transforms their flagship Kimi K2 model into an agentic system that doesn't just chat but actually creates, designs, and builds. Think of this as giving K2 hands - or rather a virtual computer where it can independently self-scope projects, navigate tools, write code, design interfaces, analyze massive datasets, and deliver end-to-end production-ready outputs.
This entire setup functions just like any human would - K2 first parses your prompt to break it down into a to-do list and subtasks, scoping requirements, surveying data, designing elements, and engineering solutions. It then spins up a virtual environment with natively trained tools like a file system, browser, terminal, etc., to autonomously execute these subtasks step-by-step, iterating as needed for refinement. All these are compiled to deliver the final output.
Key Highlights:
True Agent Architecture - Operates independently with trained access to file systems, browsers, and terminals, enabling self-directed task completion without external tool dependencies.
End-to-End Development - Creates complete multi-page websites with mobile-responsive designs, handling everything from UI wireframes to functional code implementation in a single workflow.
Data Analysis - Processes massive datasets up to 1 million rows, generating interactive visualizations and business insights with automated cleaning and querying capabilities.
Professional Content Creation - Produces high-quality, editable slide presentations and documents with multimedia integration, following structured proposal and design workflows.
While Meta shelved the much-hyped Llama 4 Behemoth after lukewarm reception, they've been cooking something more targeted in their labs.
Meta's FAIR CodeGen team just dropped Code World Model (CWM), a 32B parameter coding and reasoning model that takes a completely different approach to code generation by learning how code actually executes.
Unlike traditional models that predict code tokens based on patterns, CWM was trained on massive datasets of Python execution traces and Docker environment interactions, teaching it to understand what code does when it runs. The model achieves 65.8% on SWE-bench Verified with test-time scaling, and shows strong performance across coding benchmarks. The team’s betting that understanding code semantics, not just syntax, is the key to better AI programmers.
Key Highlights:
World Model Training - Trained on 5 trillion tokens of Python execution traces and agentic Docker interactions to understand code semantics beyond syntax
Strong Benchmarks - Hits 68.6% on LiveCodeBench, 96.6% on Math-500, and 76.0% on AIME 2024, outperforming similar-sized open models, reaching almost Claude 5 and Gemini 2.5 Flash-level performance.
Execution Prediction - Can simulate Python code execution step-by-step and predict program outputs without actually running the code
Research License - Released under a non-commercial research license with full model weights and intermediate checkpoints on Hugging Face for community exploration.
The Gold standard for AI news
AI will eliminate 300 million jobs in the next 5 years.
Yours doesn't have to be one of them.
Here's how to future-proof your career:
Join the Superhuman AI newsletter - read by 1M+ professionals
Learn AI skills in 3 mins a day
Become the AI expert on your team
Quick Bites
ChatGPT now works proactively while you sleep
OpenAI launched ChatGPT Pulse, a proactive assistant feature in ChatGPT that researches topics overnight and delivers personalized morning briefings through visual cards. It synthesizes your chat history, memory, and connected apps like Gmail and Calendar to proactively surface relevant updates like meeting agendas, follow-ups on topics you discuss often, or recommendations for upcoming trips. Currently rolling out to Pro subscribers ($200/month) with Plus users getting access soon.
Perplexity opens its Search Engine to developers
Perplexity has launched its Search API, giving you access to the same global-scale infrastructure that powers its public answer engine with an index covering hundreds of billions of webpages. The API is optimized for AI applications with fine-grained document indexing that surfaces the most relevant snippets already ranked. The team has also released an open-source evaluation framework showing that they outperform competitors like Exa, Brave, and SEPR on both output quality and latency. The API also comes with an SDK to access the Perplexity APIs with type safety and async support.
Gemini 2.5 Flash shows 5% uptick in SWE-Bench-Verified with fewer tokens
Google has released updated versions of their Gemini 2.5 Flash and Flash Lite models, available on Google AI Studio and Vertex AI, though these previews aren't intended to become stable releases. The new Flash model is now significantly better at agentic tool-use, improving its performance on complex, agentic, and multi-step applications, along with a 5% improvement in SWE-bench Verified. On the other hand, the Flash-Lite model is now better at instruction following, verbosity, and multimodal & translation capabilities.
Also, no need to keep track of the long string names - the team has also introduced "-latest" aliases (gemini-flash-latest, gemini-flash-lite-latest) that automatically point to the newest model versions.
Ollama released web search in the API and MCP server
Ollama launched web search and fetch APIs that work two ways: as standard REST endpoints you can call directly, or as an MCP server for integration with MCP-compatible tools like Claude Code and Codex. The service includes both broad web search and targeted page fetching, making it straightforward to build search agents that can both query and retrieve detailed information from the web.
Exa indexed 1B+ docs pages to cut down hallucinated AI-generated code
Vibe coding is great until the AI coding agent starts hallucinating when working with unfamiliar libraries and APIs. Exa has launched exa-code, a web-scale context tool to tackle exactly this. Instead of dumping long docs, it searches across 1B+ pages, extracts a few hundred highly relevant tokens - mostly code snippets - from across the web, returning only what matters. It does appear effective: their benchmarks show exa-code eliminates more coding hallucinations than other context tools across many libraries and SDKs.
GitHub decided your CLI needed one more coding agent
GitHub too decided to join the CLI agent group with Copilot CLI, now in public preview, because apparently having OpenAI, Google, Anthropic, and half the tech universe already camping in your command line wasn't crowded enough. Copilot CLI comes with the usual “checklist” of features - MCP support, GitHub integration, AGENTS.md files. First, an MCP Registry that still has under 50 MCP servers, and now Copilot CLI gives the "oh right, we should probably do this too" vibe. But you might want to check it out!
Tools of the Trade
Webhound - AI research agent that automatically crawls websites and extracts structured data into spreadsheets on any topic. It uses a multi-agent system with Gemini 2.5 Flash to plan a schema and search paths first, then executes asynchronous agents to fetch and validate the data. You can try it out with this no-signup link.
Archestra - A local AI client with ChatGPT-like UI that lets you use AI models (both local and cloud) along with MCP servers with built-in security sandboxing and authentication management. It currently supports 850+ open-source MCP servers and runs them in isolated containers to prevent supply chain attacks.
Helium Browser - Open-source Chromium-based browser that blocks ads, trackers, and analytics by default while supporting all Chrome extensions. It includes features like native !bangs for direct site navigation and split view for side-by-side browsing.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)
Hot Takes
I’m tired of being “absolutely right!” when coding with an agent ~
Shreya ShankarAgentic coding is a skill that scales with your technical knowledge. The best engineers I know are way better than me at using Claude Code too. ~
Thariq
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply