unwind ai
Posts
Build an AI Research Agent with Google Interactions API & Gemini 3

Build an AI Research Agent with Google Interactions API & Gemini 3

Multi-phase AI research agent with Google Interactions API, Gemini Deep Research Agent, and Gemini 3 models (100% open source)

Shubham Saboo & Gargi Gupta
January 01, 2026

Google recently launched the Interactions API alongside Gemini Deep Research, an autonomous research agent that can conduct comprehensive multi-step investigations. This is a significant shift from traditional APIs - instead of stateless request-response cycles, you get server-side state management, background execution for long-running tasks, and seamless handoffs between different models and agents.

In this tutorial, we'll build an AI Research Planner & Executor Agent that demonstrates these capabilities in action. The system uses a three-phase workflow: Gemini 3 Flash creates research plans, Deep Research Agent executes comprehensive web investigations, and Gemini 3 Pro synthesizes findings into executive reports with auto-generated infographics.

What is Gemini Deep Research?

Gemini Deep Research is an autonomous research agent powered by Gemini 3 Pro that's accessible through the Interactions API. It doesn't just answer questions. It plans investigations, formulates search queries, reads results, identifies knowledge gaps, and searches again iteratively. The agent operates asynchronously, taking 2-5 minutes to browse hundreds of websites and synthesize findings.

What makes the Interactions API special?

Unlike traditional APIs where you send all context with every request, the Interactions API manages conversation history server-side. This enables stateful multi-turn workflows, background execution for tasks that exceed standard HTTP timeouts, and the ability to chain different models together while preserving full context. It's specifically designed for building production-ready agentic applications.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This Streamlit application implements a sophisticated three-phase research workflow that demonstrates the power of Google's Interactions API. The system combines multiple Gemini models, each optimized for specific tasks, while maintaining stateful context across phases.

Features:

Multi-Phase Research Workflow:
- Phase 1: Uses Gemini 3 Flash to generate structured research plans
- Phase 2: Leverages Deep Research Agent for autonomous web investigation
- Phase 3: Employs Gemini 3 Pro for executive synthesis with auto-generated infographics
Stateful Conversation Management: Demonstrates previous_interaction_id to chain phases together while preserving full context
Background Execution: Async research with progress tracking for tasks that take 2-5 minutes
Auto-Generated Infographics: Creates whiteboard-style TL;DR summaries using Nano Banana
Interactive Task Selection: Choose specific research tasks to focus your investigation
Export Capabilities: Download comprehensive reports as markdown files

What We’re Building

This application orchestrates a sophisticated three-phase research workflow:

Phase 1 - Planning:

The system uses Gemini 3 Flash (optimized for speed) to break down your research goal into 5-8 specific, actionable tasks. The interaction is stored with store=True, and we capture the interaction.id for later reference.

Phase 2 - Research:

Users select which tasks to investigate. The app passes these to the Deep Research Agent using agent="deep-research-pro-preview-12-2025" (note: agents use the agent parameter, not model). Critically, we include previous_interaction_id=st.session_state.plan_id to give the agent full context from the planning phase. Since research takes 2-5 minutes, we use background=True for async execution and poll for completion.

Phase 3 - Synthesis:

Gemini 3 Pro (optimized for quality) creates an executive report. Again, we use previous_interaction_id to access the complete research findings. The infographic generation uses the standard generate_content API (not Interactions API) because it's a single-turn image generation task.

Stateful Context Management:

The key innovation is how context flows between phases. Each phase creates an interaction that can be referenced by the next phase via previous_interaction_id. This server-side state management eliminates the need to manually pass megabytes of conversation history with each request.

Prerequisites

Before we begin, make sure you have the following:

Python installed on your machine (version 3.12 is recommended)
Your Gemini API key for using Gemini models and the Interactions API
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming

Code Walkthrough

Setting Up the Environment

First, let's get our development environment ready:

Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git

🌟 Don't forget to star the opensource repo to show your support.

Go to the research_agent_gemini_interaction_api folder:

cd advanced_ai_agents/single_agent_apps/research_agent_gemini_interaction_api

Install the required dependencies:

pip install -r requirements.txt

Grab your Gemini API key from Google AI Studio.

Creating the App

Here’s the code walkthrough in the research_planner_executor_agent.py file:

Import libraries and set up helper functions:
- Streamlit for the UI
- Google GenAI for Interactions API
- Time and regex for progress tracking and task parsing

import streamlit as st, time, re
from google import genai

def get_text(outputs): 
    return "\n".join(o.text for o in (outputs or []) if hasattr(o, 'text') and o.text) or ""

def parse_tasks(text):
    return [{"num": m.group(1), "text": m.group(2).strip().replace('\n', ' ')} 
            for m in re.finditer(r'^(\d+)[\.\)\-]\s*(.+?)(?=\n\d+[\.\)\-]|\n\n|\Z)', text, re.MULTILINE | re.DOTALL)]

Create background execution handler:
- Polls for completion of long-running tasks
- Shows progress updates every 3 seconds
- Handles timeout scenarios gracefully

def wait_for_completion(client, iid, timeout=300):
    progress, status, elapsed = st.progress(0), st.empty(), 0
    while elapsed < timeout:
        interaction = client.interactions.get(iid)
        if interaction.status != "in_progress": 
            progress.progress(100)
            return interaction
        elapsed += 3
        progress.progress(min(90, int(elapsed/timeout*100)))
        status.text(f"⏳ {elapsed}s...")
        time.sleep(3)
    return client.interactions.get(iid)

Initialize Streamlit app and session state:
- Configure page layout and title
- Set up session state variables for each phase
- Maintains context across user interactions

st.set_page_config(page_title="Research Planner", page_icon="🔬", layout="wide")
st.title("🔬 AI Research Planner & Executor Agent (Gemini Interactions API) ✨")

for k in ["plan_id", "plan_text", "tasks", "research_id", "research_text", "synthesis_text", "infographic"]:
    if k not in st.session_state: 
        st.session_state[k] = [] if k == "tasks" else None

Create sidebar with API key input and instructions:
- Secure API key entry
- Reset functionality to clear all phases
- Helpful workflow explanation

with st.sidebar:
    api_key = st.text_input("🔑 Google API Key", type="password")
    if st.button("Reset"): 
        [setattr(st.session_state, k, [] if k == "tasks" else None) 
         for k in ["plan_id", "plan_text", "tasks", "research_id", "research_text", "synthesis_text", "infographic"]]
        st.rerun()
    st.markdown("""
    ### How It Works
    1. **Plan** → Gemini 3 Flash creates research tasks
    2. **Select** → Choose which tasks to research  
    3. **Research** → Deep Research Agent investigates
    4. **Synthesize** → Gemini 3 Pro writes report + TL;DR infographic
    
    Each phase chains via `previous_interaction_id` for context.
    """)

Initialize Gemini client:
- Creates client with API key
- Validates authentication before proceeding

client = genai.Client(api_key=api_key) if api_key else None
if not client: 
    st.info("👆 Enter API key to start")
    st.stop()

Phase 1: Generate Research Plan with Gemini 3 Flash:
- Takes user's research goal as input
- Uses Gemini 3 Flash for fast planning
- Stores interaction ID for stateful continuation
- Parses numbered tasks for selection

research_goal = st.text_area("📝 Research Goal", placeholder="e.g., Research B2B HR SaaS market in Germany")
if st.button("📋 Generate Plan", disabled=not research_goal, type="primary"):
    with st.spinner("Planning..."):
        try:
            i = client.interactions.create(
                model="gemini-3-flash-preview", 
                input=f"Create a numbered research plan for: {research_goal}\n\nFormat: 1. [Task] - [Details]\n\nInclude 5-8 specific tasks.", 
                tools=[{"type": "google_search"}], 
                store=True
            )
            st.session_state.plan_id = i.id
            st.session_state.plan_text = get_text(i.outputs)
            st.session_state.tasks = parse_tasks(get_text(i.outputs))
        except Exception as e: 
            st.error(f"Error: {e}")

Phase 2: Interactive Task Selection and Deep Research:
- Displays checkboxes for each planned task
- Users select which tasks to investigate
- Passes selected tasks to Deep Research Agent
- Uses previous_interaction_id to maintain context from planning phase
- Executes in background with progress tracking

if st.session_state.plan_text:
    st.divider()
    st.subheader("🔍 Select Tasks & Research")
    selected = [f"{t['num']}. {t['text']}" for t in st.session_state.tasks 
                if st.checkbox(f"**{t['num']}.** {t['text']}", True, key=f"t{t['num']}")]
    st.caption(f"✅ {len(selected)}/{len(st.session_state.tasks)} selected")
    
    if st.button("🚀 Start Deep Research", type="primary", disabled=not selected):
        with st.spinner("Researching (2-5 min)..."):
            try:
                i = client.interactions.create(
                    agent="deep-research-pro-preview-12-2025", 
                    input=f"Research these tasks thoroughly with sources:\n\n" + "\n\n".join(selected), 
                    previous_interaction_id=st.session_state.plan_id, 
                    background=True, 
                    store=True
                )
                i = wait_for_completion(client, i.id)
                st.session_state.research_id = i.id
                st.session_state.research_text = get_text(i.outputs) or f"Status: {i.status}"
                st.rerun()
            except Exception as e: 
                st.error(f"Error: {e}")

Display research results:
- Shows comprehensive findings with citations
- Formatted markdown output

if st.session_state.research_text:
    st.divider()
    st.subheader("📄 Research Results")
    st.markdown(st.session_state.research_text)

Phase 3: Synthesize Executive Report with Gemini 3 Pro:
- Creates structured report with key sections
- Uses previous_interaction_id to access full research context
- Generates whiteboard-style infographic using Gemini 3 Pro Image
- Combines text and visual synthesis

if st.session_state.research_id:
    if st.button("📊 Generate Executive Report", type="primary"):
        with st.spinner("Synthesizing report..."):
            try:
                i = client.interactions.create(
                    model="gemini-3-pro-preview", 
                    input=f"Create executive report with Summary, Findings, Recommendations, Risks:\n\n{st.session_state.research_text}", 
                    previous_interaction_id=st.session_state.research_id, 
                    store=True
                )
                st.session_state.synthesis_text = get_text(i.outputs)
            except Exception as e: 
                st.error(f"Error: {e}")
                st.stop()
        
        with st.spinner("Creating TL;DR infographic..."):
            try:
                response = client.models.generate_content(
                    model="gemini-3-pro-image-preview",
                    contents=f"Create a whiteboard summary infographic for the following: {st.session_state.synthesis_text}"
                )
                for part in response.candidates[0].content.parts:
                    if hasattr(part, 'inline_data') and part.inline_data:
                        st.session_state.infographic = part.inline_data.data
                        break
            except Exception as e: 
                st.warning(f"Infographic error: {e}")
        st.rerun()

Display final report with infographic and download option:
- Shows TL;DR infographic at the top
- Full executive report below
- Markdown download functionality

if st.session_state.synthesis_text:
    st.divider()
    st.markdown("## 📊 Executive Report")
    
    # TL;DR Infographic at the top
    if st.session_state.infographic:
        st.markdown("### 🎨 TL;DR")
        st.image(st.session_state.infographic, use_container_width=True)
        st.divider()
    
    st.markdown(st.session_state.synthesis_text)
    st.download_button("📥 Download Report", st.session_state.synthesis_text, "research_report.md", "text/markdown")

st.divider()
st.caption("[Gemini Interactions API](https://ai.google.dev/gemini-api/docs/interactions)")

Running the App

With our code in place, it's time to launch the app.

In your terminal, navigate to the project folder and run:

streamlit run research_planner_executor_agent.py

Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser.
Enter your Google API key in the sidebar.
Try an example research goal:
- "Research the B2B HR SaaS market in Germany - key players, regulations, pricing models"
- "Analyze market opportunities for AI-powered customer support tools"
- "Investigate the competitive landscape for sustainable packaging in e-commerce"
Click "Generate Plan" and watch as Gemini 3 Flash creates a structured research plan.
Select the tasks you want to investigate (or keep them all selected).
Click "Start Deep Research" and wait a couple of minutes as the Deep Research Agent conducts comprehensive web research.
Review the research results, then click "Generate Executive Report" to synthesize findings with an auto-generated infographic.
Download your complete research report as a markdown file!

Working Application Demo

Conclusion

You've just built a multi-phase AI Research Agent that demonstrates the cutting-edge capabilities of Google's Interactions API. This isn't just a proof-of-concept; it's a production-ready system that combines stateful conversation management, background execution, model mixing, and autonomous research capabilities.

What makes this powerful is the seamless orchestration: Gemini 3 Flash for fast planning, Deep Research Agent for thorough investigation, and Gemini 3 Pro for synthesis - all connected through stateful interactions that maintain full context without manual history management.

For further enhancements, consider:

Custom Data Sources: Add the File Search tool to let Deep Research analyze your private documents alongside public web data.
Multi-Report Comparison: Store multiple research reports and create comparative analyses across different topics or time periods.
Collaborative Research: Enable team members to review and refine research plans before execution, with version tracking.
Automated Scheduling: Set up periodic research tasks that automatically investigate evolving topics and alert you to significant changes.
Custom Formatting: Provide explicit output formatting instructions to structure reports for different audiences (technical, executive, investor).

Keep experimenting with different configurations and features to build more sophisticated AI applications.

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads) to support us!

Reply

or to participate.