unwind ai
Posts
Fully-Managed RAG in Gemini API

Fully-Managed RAG in Gemini API

+ Open-source agentic Kimi K2 Thinking model, Codemaps in Windsurf

Shubham Saboo & Gargi Gupta
November 07, 2025

In partnership with

Today’s top AI Highlights:

& so much more!

Read time: 3 mins

AI Tutorial

Build an AI SEO Audit Team with Gemini and Google ADK

SEO optimization is both critical and time-consuming for teams building businesses. Manually auditing pages, researching competitors, and synthesizing actionable recommendations can eat up hours that you'd rather spend strategizing.

In this tutorial, we'll build an AI SEO Audit Team using Google's Agent Development Kit (ADK) and Gemini 2.5 Flash. This multi-agent system autonomously crawls any webpage, researches live search results, and delivers a polished optimization report through a clean web interface that traces every step of the workflow.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI SEO Audit Team with Gemini

Multi-agent app using Google ADK and Gemini 2.5 Flash (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

Fully-Managed RAG Now Built into Gemini API 📖 🔦

Forget stitching together vector databases, embedding models, and retrieval logic. Google's File Search Tool wraps the entire RAG stack into a single API call in Gemini.

The pricing model is insane too: free storage and query embeddings, with a one-time indexing cost of $0.15 per million tokens.

The tool handles everything from chunking to embedding to context injection, so you can skip building yet another retrieval pipeline from scratch. It's built directly into the existing generateContent API, which means no new endpoints to learn or infrastructure to spin up. File Search uses Google's latest Gemini Embedding model for vector search that understands query intent, not just keyword matches. Plus, every response includes automatic citations pointing back to source documents, and you can throw in PDFs, DOCX, TXT, JSON, and most programming file formats without preprocessing.

Key Highlights:

Zero storage costs – Unlike traditional RAG setups where vector storage scales with your data, File Search makes storage and query-time embedding generation free. You pay only during initial indexing at $0.15 per million tokens.
Auto-managed retrieval pipeline – The system handles chunking strategies, embedding generation, and dynamic context injection automatically. No need to tune chunk sizes or manage vector databases separately.
Built-in source tracking – Responses include citations that map back to specific parts of your documents. This eliminates manual verification work when you need to trace where information came from.
Quick integration path – Works within the existing generateContent API as an additional parameter. Try it immediately with the demo app in Google AI Studio using your API key.

Want to get the most out of ChatGPT?

ChatGPT is a superpower if you know how to use it correctly.

Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.

Download the free guide

Open-source Kimi K2 Thinking Agentic Model from China 🏆🔓

China's new open-source model outperforms GPT-5 and Claude Sonnet 4.5 in reasoning, agentic search, and coding, while being ~90% cheaper.

Meet Kimi K2 Thinking, an agentic thinking model with 1 trillion total parameters that activates 32B parameters at inference.

What makes this different is that it can execute up to 200-300 sequential tool calls autonomously, while maintaining stable reasoning, compared to the 30-50 step limit where other models start degrading.

The model comes with native INT4 quantization using QAT that doubles generation speed without sacrificing accuracy, a 256K context window, and full deployment support for vLLM, SGLang, and KTransformers under a Modified MIT license.

Key Highlights:

Deep Multi-Step Reasoning - End-to-end trained to alternate between chain-of-thought reasoning and function calling, supporting autonomous workflows that span hundreds of steps for research, coding, and writing tasks without the coherence drift.
State-of-the-art Performance - Matches or outperforms GPT-5 and Claude Sonnet 4.5 on reasoning (44.9% on HLE), math (99.1 on AIME 2025), and agentic tasks, including coding and search.
Production-Ready Quantization - Ships with native INT4 weights trained using QAT during post-training phase, delivering lossless 2x speed improvements in generation while maintaining SOTA performance.
Open-source and Cost-Effective - API (OpenAI-compatible) pricing at $0.15 input/$2.50 output per 1M tokens makes it 80-90% cheaper than GPT-5 and Claude Sonnet 4.5. The model weights and code are also open-sourced under a modified MIT license.

Quick Bites

Codemaps in Windsurf to understand code before vibing
Cognition just shipped Codemaps in Windsurf, AI-generated structural maps of your codebase that actually show you where things are instead of just vibing through generated code. Available in Windsurf, the feature uses SWE-1.5 or Claude Sonnet 4.5 to trace data flows, group related logic, and link explanations directly to specific lines. The company plans to open up the .codemap format as a protocol for other tools and agents.

Anthropic shows AI models have something on their mind
New Anthropic research demonstrates that Claude models possess rudimentary introspective awareness. They can identify concepts that researchers inject into their internal representations, and they distinguish between intentional versus accidental outputs by checking their own prior neural states. The reliability is abysmal (20% success rate), but there's a clear progression: their most capable models, Opus 4 and 4.1, perform best at these introspective tasks, hinting that self-awareness might emerge as a natural byproduct of increasing intelligence rather than requiring explicit training.

OpenAI’s new benchmark for Indian languages and cultural context
OpenAI just dropped IndQA, a benchmark that actually tests whether AI models understand Indian culture, not just translate words. The benchmark covers 12 languages, including Hinglish, and spans domains from Punjabi music to Kannada linguistics. The bench is filtered to keep only questions that stumped GPT-4o and o3. The interesting bit is that even their best models still have significant room to improve. The Indian market is huge, especially for voice AI, and we're still early in making AI that truly works across cultures.

Tools of the Trade

OpenSpec - A spec-driven development framework that forces humans and AI coding agents to agree on specifications before writing code. It creates a structured folder system separating current specs from proposed changes. Has built-in commands for tools like Claude Code and Cursor that track proposals, tasks, and spec deltas in one place.
SkillsMP - A community marketplace hosting 1132+ skills that extend Claude's capabilities beyond its base functionality. Browse, install, and run specialized tools without writing your own.
sudocode - A lightweight context management system for coding agents that lives in your repo. It captures user intent as durable specs and tracks agent activity as issues, all version-controlled with Git. This "context-as-code" approach reduces agent amnesia and accelerates development on long-horizon tasks.
SemTools - Adds semantic search capabilities to CLI coding agents. It gives two commands: parse (converts PDFs/docs to markdown via LlamaParse) and search (performs local semantic search using multilingual embeddings). It lets coding agents semantically query large document collections instead of relying solely on grep.
Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
(Now accepting GitHub sponsorships)

Hot Takes

ai is ozempic for corporations.
~ signüll
There will be no federal bailout for AI. The U.S. has at least 5 major frontier model companies. If one fails, others will take its place.
~ David Sacks

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.