unwind ai
Posts
Build a Local Agentic RAG App with Google EmbeddingGemma

Build a Local Agentic RAG App with Google EmbeddingGemma

Fully local agentic RAG app using Google EmbeddingGemma model (100% opensource)

Shubham Saboo & Gargi Gupta
September 06, 2025

Privacy-first AI applications are no longer a luxury; they're a necessity. We understand the importance of keeping sensitive data secure while still delivering powerful AI experiences. The recent release of Google's EmbeddingGemma model changes the game for on-device AI applications, making it possible to build sophisticated RAG systems that run entirely offline.

In this tutorial, we'll build a local agentic RAG system using Google's newly released EmbeddingGemma model paired with Meta’s Llama 3.2. This system will intelligently search through your documents and provide contextual answers - all running 100% locally with no internet connection required.

What is EmbeddingGemma? Google's EmbeddingGemma is a breakthrough 308M embedding model specifically designed for on-device AI. It's the highest-ranking open multilingual text embedding model under 500M parameters on the Massive Text Embedding Benchmark (MTEB), delivering state-of-the-art performance while using less than 200MB of RAM.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This application implements a fully local agentic RAG system that combines Google's EmbeddingGemma with Llama 3.2 to create an intelligent document assistant. The system can process PDF documents from URLs, create semantic embeddings, and answer questions based on the content, all running on your local machine.

Features:

100% Local Processing: No external API calls or internet dependency during operation
EmbeddingGemma Integration: Uses Google's latest embedding model for high-quality vector representations
Agentic Architecture: An intelligent agent that decides when and how to search the knowledge base
Dynamic Knowledge Base: Add PDF documents via URLs and query them instantly
Streaming Responses: Real-time answer generation with visible tool usage
Privacy-First Design: All document processing happens on your device

How The App Works

Document Ingestion: When you add a PDF URL, the system downloads and processes the document, breaking it into chunks for optimal retrieval.
Embedding Generation: EmbeddingGemma creates high-quality vector representations of the document chunks, capturing semantic meaning in 768-dimensional vectors.
Vector Storage: LanceDB stores these embeddings locally, creating a searchable knowledge base on your machine.
Intelligent Querying: When you ask a question, the agent embeds your query using EmbeddingGemma and performs similarity search to find relevant document sections.
Contextual Response: Llama 3.2 generates answers based on the retrieved context, providing accurate and relevant responses.
Agentic Behavior: The system intelligently decides when to search the knowledge base and how to structure responses for maximum clarity.

Prerequisites

Before we begin, make sure you have the following:

Python installed on your machine (version 3.10 or higher is recommended)
Ollama downloaded and installed on your system
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming

Code Walkthrough

Setting Up the Environment

First, let's get our development environment ready:

Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git

🌟 Don't forget to star the opensource repo to show your support.

Go to the agentic_rag_embedding_gemma folder:

cd rag_tutorials/agentic_rag_embedding_gemma

Install the required dependencies:

pip install -r requirements.txt

Ensure Ollama is installed and running with the required models:
- Pull the models: ollama pull embeddinggemma:latest and ollama pull llama3.2:latest
- Start Ollama server if not running

Creating the Streamlit App

Let's create our app. Create a new file agentic_rag_embeddinggemma.py and add the following code:

Import necessary libraries:

import streamlit as st
from agno.agent import Agent
from agno.embedder.ollama import OllamaEmbedder
from agno.knowledge.pdf_url import PDFUrlKnowledgeBase
from agno.models.ollama import Ollama
from agno.vectordb.lancedb import LanceDb, SearchType

Configure the Streamlit App:

st.set_page_config(
    page_title="Agentic RAG with Google's EmbeddingGemma",
    page_icon="🔥",
    layout="wide"
)

Create Knowledge Base with EmbeddingGemma:

@st.cache_resource
def load_knowledge_base(urls):
    knowledge_base = PDFUrlKnowledgeBase(
        urls=urls,
        vector_db=LanceDb(
            table_name="recipes",
            uri="tmp/lancedb",
            search_type=SearchType.vector,
            embedder=OllamaEmbedder(id="embeddinggemma:latest", dimensions=768),
        ),
    )
    knowledge_base.load()
    return knowledge_base

Initialize Session State and Knowledge Base:

if 'urls' not in st.session_state:
    st.session_state.urls = []

kb = load_knowledge_base(st.session_state.urls)

Create the Agentic RAG Agent:

agent = Agent(
    model=Ollama(id="llama3.2:latest"),
    knowledge=kb,
    instructions=[
        "Search the knowledge base for relevant information and base your answers on it.",
        "Be clear, and generate well-structured answers.",
        "Use clear headings, bullet points, or numbered lists where appropriate.",
    ],
    search_knowledge=True,
    show_tool_calls=False,
    markdown=True,
)

Build the Sidebar for Knowledge Management:

with st.sidebar:
    col1, col2, col3 = st.columns(3)
    with col1:
        st.image("google.png")
    with col2:
        st.image("ollama.png")
    with col3:
        st.image("agno.png")
    
    st.header("🌐 Add Knowledge Sources")
    new_url = st.text_input(
        "Add URL",
        placeholder="https://example.com/sample.pdf",
        help="Enter a PDF URL to add to the knowledge base",
    )
    
    if st.button("➕ Add URL", type="primary"):
        if new_url:
            kb.urls.append(new_url)
            with st.spinner("📥 Adding new URL..."):
                kb.load(recreate=False, upsert=True)
            st.success(f"✅ Added: {new_url}")
        else:
            st.error("Please enter a URL")

Create the Main Interface:

st.title("🔥 Agentic RAG with EmbeddingGemma (100% local)")
st.markdown(
    """
This app demonstrates an agentic RAG system using local models via [Ollama](https://ollama.com/):

- **EmbeddingGemma** for creating vector embeddings
- **LanceDB** as the local vector database

Add PDF URLs in the sidebar to start and ask questions about the content.
    """
)

Implement Query Processing and Response Generation:

query = st.text_input("Enter your question:")

if st.button("🚀 Get Answer", type="primary"):
    if not query:
        st.error("Please enter a question")
    else:
        st.markdown("### 💡 Answer")
        
        with st.spinner("🔍 Searching knowledge and generating answer..."):
            try:
                response = ""
                resp_container = st.empty()
                gen = agent.run(query, stream=True)
                for resp_chunk in gen:
                    if resp_chunk.content is not None:
                        response += resp_chunk.content
                        resp_container.markdown(response)
            except Exception as e:
                st.error(f"Error: {e}")

Add Information content:

with st.expander("📖 How This Works"):
    st.markdown(
        """
**This app uses the Agno framework to create an intelligent Q&A system:**

1. **Knowledge Loading**: PDF URLs are processed and stored in LanceDB vector database
2. **EmbeddingGemma as Embedder**: EmbeddingGemma generates local embeddings for semantic search
3. **Llama 3.2**: The Llama 3.2 model generates answers based on retrieved context

**Key Components:**
- `EmbeddingGemma` as the embedder
- `LanceDB` as the vector database
- `PDFUrlKnowledgeBase`: Manages document loading from PDF URLs
- `OllamaEmbedder`: Uses EmbeddingGemma for embeddings
- `Agno Agent`: Orchestrates everything to answer questions
        """
    )

Running the App

With our code in place, it's time to launch the app.

Ensure Ollama is running with the required models.
In your terminal, navigate to the project folder and run:

streamlit run agentic_rag_embeddinggemma.py

Streamlit will provide a local URL (typically http://localhost:8501). Add PDF URLs in the sidebar to build your knowledge base. Start asking questions about your documents and experience the power of local AI!

Working Application Demo

Conclusion

You've successfully built a local agentic RAG system powered by Google's cutting-edge EmbeddingGemma model and Meta’s Llama 3.2 model.

This setup can now be expanded further:

Multi-format Document Support: Extend the system to handle Word documents, text files, and web pages alongside PDFs.
Advanced Query Capabilities: Add support for complex queries like comparisons, summaries, and multi-step reasoning.
Custom Fine-tuning: Fine-tune EmbeddingGemma on your specific domain for even better performance on specialized documents.
Conversation Memory: Implement conversation history to enable follow-up questions and context-aware interactions.
Document Management: Add features to organize, categorize, and manage your document collection more effectively.

Keep experimenting with different configurations and features to build more sophisticated AI applications.

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Reply

or to participate.