• unwind ai
  • Posts
  • Vision RAG - No OCR, No Database

Vision RAG - No OCR, No Database

+ NotebookLM gets 8x context window increase, LangChain DeepAgents CLI

In partnership with

Today’s top AI Highlights:

& so much more!

Read time: 3 mins

AI Tutorial

SEO optimization is both critical and time-consuming for teams building businesses. Manually auditing pages, researching competitors, and synthesizing actionable recommendations can eat up hours that you'd rather spend strategizing.

In this tutorial, we'll build an AI SEO Audit Team using Google's Agent Development Kit (ADK) and Gemini 2.5 Flash. This multi-agent system autonomously crawls any webpage, researches live search results, and delivers a polished optimization report through a clean web interface that traces every step of the workflow.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Latest Developments

DeepSeek OCR is powerful, but do you even need OCR models for RAG?

OCR models first map the layout of a document, and then convert the text along with 2D visual elements like tables and figures into 1D text sequences. This inevitably loses spatial relationships, contextual information, and visual hierarchies.

If vision models can already process document images and queries together, why do we even need OCR?

PageIndex takes a different approach with vision-based RAG that mimics how humans actually read documents: reasoning over a hierarchical table-of-contents structure to identify relevant pages, then processing those pages as images with VLMs like GPT-4.1 for visual understanding and answer generation.

No OCR, no vectors, no embeddings, no database. Just pure reasoning.

Key Highlights:

  1. Vectorless Retrieval - Zero vector embeddings or database setup required. Uses LLM reasoning over hierarchical tree structure instead of approximate semantic similarity search that often misses true relevance.

  2. Information-Preserving Pipeline - Processes PDF pages directly as images without OCR, preserving spatial layout and visual context that gets lost when documents are flattened into text sequences.

  3. No Chunking Needed - Documents organized into natural sections based on actual structure, eliminating arbitrary chunk boundaries that break context and require manual parameter tuning.

  4. Pure Reasoning-Based Navigation - Identifies relevant pages through multi-step thinking over document hierarchy.

  5. Try It Now - Complete implementation available as open-source notebook with step-by-step code for building vision-based QA systems using PageIndex.

Turn AI into Your Income Engine

Ready to transform artificial intelligence from a buzzword into your personal revenue generator?

HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.

Inside you'll discover:

  • A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential

  • Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background

  • Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve

Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.

Declarative AI workflows you can read, write, and trust - like Dockerfile or SQL but for multi-step LLM pipelines.

Pipelex gives you a DSL and Python runtime for repeatable AI workflows. You declare what happens at each step, any model or provider can run it.

It's declarative (tell your intent, runtime handles how), agent-first (natural-language context that LLMs parse directly), open under MIT (language spec, runtime, API server, MCP server, n8n node, VS Code extension), and composable (pipes call other pipes you build or grab from the community).

Why invent a new language instead of using Python or visual tools? Because prompts can't give you determinism and traditional code hides logic in variable names, but a DSL can preserve context in structured syntax that both humans and AI understand.

The team went further meta: they built a Pipelex workflow that writes Pipelex workflows itself, so pipelex build pipe "description" generates validated .plx files you run instantly or refine with coding agents.

Key Highlights:

  1. Context baked into syntax - Unlike code where intent lives in comments, every pipe explicitly states its purpose, inputs, outputs, and semantic meaning in natural language that both humans and LLMs parse natively.

  2. Workflows that write workflows - The team dogfooded their DSL to build a workflow generator, meaning you describe what you need and get a complete, executable pipeline with typed concepts and validation built in.

  3. Deploy anywhere - Run locally via CLI, integrate with Python apps, deploy as FastAPI/Docker containers, use as n8n nodes, or let agents use them as tools via MCP servers. No platform lock-in, just portable .plx files.

  4. When to use - Pipelex is the right choice when you need repeatable, deterministic AI workflows for knowledge work. When you're processing invoices, analyzing contracts, or generating reports, you need consistent results every time. It's not for creative exploration or open-ended tasks.

Quick Bites

Google’s NotebookLM gets an 8x larger context window
Google has made some fundamental backend upgrades to NotebookLM. Firstly, they have enabled the full 1 million token context window of Gemini in NotebookLM chat across all plans, which increased its context window by 8x. They also improved the capacity for multiturn conversation in NotebookLM by 6x. There are also some QoL upgrades across all the plans, like saved conversation history and personalizing your notebook chats.

Debug your app within Chrome using Gemini
You can now debug an application’s full trace within Chrome DevTools using Gemini. After recording a trace, you can now chat with Gemini about the entire trace, related Performance insights, and even connected field data - all without needing to select specific context beforehand.

LangChain DeepAgents framework now in CLI
LangChain just shipped DeepAgents CLI, bringing their DeepAgents framework straight to your terminal. Install with pip install deepagents-cli, and you get an agent that can edit files, run shell commands, search the web, and even remember information across sessions by writing memories locally to remember API patterns, project conventions, and context from previous conversations. You can spin up specialized agents for different projects, and all file edits require human approval before execution (though there's an auto-accept flag if you're feeling brave).

Hugging Face’s 200-page guide to train your own models
Hugging Face just dropped their "Smol Training Playbook," a 200+ page deep dive into building their SmolLM3 model from scratch. The team documents the complete pipeline, pretraining, post-training, and infrastructure, sharing what worked, what failed, and how to keep training runs stable. Think of it as the field notes from training a competitive 3B parameter model, minus the usual vendor mystique. And it’s completely free.

The only Claude Code guide you’ll need
If you're still treating Claude Code like a chat interface, this learning path and guide by Daniel Avila might change that. This five-level guide moves from basic CLI commands through configuration and extensions to programmatic automation and enterprise deployment. It includes practical examples like building custom subagents for code reviews, integrating external tools via MCP, and setting up automated documentation pipelines with GitHub Actions. From newbies to seasoned Claude Code users, everyone can pick up at least something from this guide.

Tools of the Trade

  1. cmux - Run Claude Code, Codex CLI, Cursor CLI, Gemini CLI, Amp, Opencode, and other CLI agents in parallel across multiple tasks. Every run spins up an isolated VS Code workspace either in the cloud or in a local Docker container with the git diff view, terminal, and dev server preview ready.

  2. llms - Lightweight CLI, API and ChatGPT-like alternative to Open WebUI that routes requests across multiple LLM providers (both local and cloud), with automatic failover between providers. Consists of a single Python file with one dependency

  3. vibe-coding-prompt-template - Turn your idea into a working app in under 3 hours with market research, PRD, and agent instructions. This repo has prompt templates that you can just copy-paste into your preferred AI model/agent and get a working MVP without touching code.

  4. VT Code - Rust-based terminal coding agent that uses tree-sitter and ast-grep for semantic code understanding. Integrates with all major LLM providers, supports MCP servers, and has extensive configuration options for controlling agent behavior and workspace boundaries.

  5. Awesome LLM Apps - A curated collection of LLM apps with RAG, AI Agents, multi-agent teams, MCP, voice agents, and more. The apps use models from OpenAI, Anthropic, Google, and open-source models like DeepSeek, Qwen, and Llama that you can run locally on your computer.
    (Now accepting GitHub sponsorships)

Hot Takes

  1. AI founder talking about why it's so hard to hire in SF today:

    "in SF, the AI labs have tremendously skewed engineers' perspectives. Every engineer either wants a package that is absolutely massive relative to their skills and experience, OR they want to be a founder."

    ~ Gokul Rajaram

  2. if i asked the people what they wanted, they would’ve said bigger models

    ~ Aidan McLaughlin

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.