- unwind ai
- Posts
- Build an AI Research Agent with Google Interactions API & Gemini 3
Build an AI Research Agent with Google Interactions API & Gemini 3
Multi-phase AI research agent with Google Interactions API, Gemini Deep Research Agent, and Gemini 3 models (100% open source)
Google recently launched the Interactions API alongside Gemini Deep Research, an autonomous research agent that can conduct comprehensive multi-step investigations. This is a significant shift from traditional APIs - instead of stateless request-response cycles, you get server-side state management, background execution for long-running tasks, and seamless handoffs between different models and agents.
In this tutorial, we'll build an AI Research Planner & Executor Agent that demonstrates these capabilities in action. The system uses a three-phase workflow: Gemini 3 Flash creates research plans, Deep Research Agent executes comprehensive web investigations, and Gemini 3 Pro synthesizes findings into executive reports with auto-generated infographics.
What is Gemini Deep Research?
Gemini Deep Research is an autonomous research agent powered by Gemini 3 Pro that's accessible through the Interactions API. It doesn't just answer questions. It plans investigations, formulates search queries, reads results, identifies knowledge gaps, and searches again iteratively. The agent operates asynchronously, taking 2-5 minutes to browse hundreds of websites and synthesize findings.
What makes the Interactions API special?
Unlike traditional APIs where you send all context with every request, the Interactions API manages conversation history server-side. This enables stateful multi-turn workflows, background execution for tasks that exceed standard HTTP timeouts, and the ability to chain different models together while preserving full context. It's specifically designed for building production-ready agentic applications.
What Weโre Building
This Streamlit application implements a sophisticated three-phase research workflow that demonstrates the power of Google's Interactions API. The system combines multiple Gemini models, each optimized for specific tasks, while maintaining stateful context across phases.
Features:
Multi-Phase Research Workflow:
Phase 1: Uses Gemini 3 Flash to generate structured research plans
Phase 2: Leverages Deep Research Agent for autonomous web investigation
Phase 3: Employs Gemini 3 Pro for executive synthesis with auto-generated infographics
Stateful Conversation Management: Demonstrates
previous_interaction_idto chain phases together while preserving full contextBackground Execution: Async research with progress tracking for tasks that take 2-5 minutes
Auto-Generated Infographics: Creates whiteboard-style TL;DR summaries using Nano Banana
Interactive Task Selection: Choose specific research tasks to focus your investigation
Export Capabilities: Download comprehensive reports as markdown files
What Weโre Building
This application orchestrates a sophisticated three-phase research workflow:
Phase 1 - Planning:
The system uses Gemini 3 Flash (optimized for speed) to break down your research goal into 5-8 specific, actionable tasks. The interaction is stored with store=True, and we capture the interaction.id for later reference.
Phase 2 - Research:
Users select which tasks to investigate. The app passes these to the Deep Research Agent using agent="deep-research-pro-preview-12-2025" (note: agents use the agent parameter, not model). Critically, we include previous_interaction_id=st.session_state.plan_id to give the agent full context from the planning phase. Since research takes 2-5 minutes, we use background=True for async execution and poll for completion.
Phase 3 - Synthesis:
Gemini 3 Pro (optimized for quality) creates an executive report. Again, we use previous_interaction_id to access the complete research findings. The infographic generation uses the standard generate_content API (not Interactions API) because it's a single-turn image generation task.
Stateful Context Management:
The key innovation is how context flows between phases. Each phase creates an interaction that can be referenced by the next phase via previous_interaction_id. This server-side state management eliminates the need to manually pass megabytes of conversation history with each request.
Prerequisites
Before we begin, make sure you have the following:
Python installed on your machine (version 3.12 is recommended)
Your Gemini API key for using Gemini models and the Interactions API
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming
Code Walkthrough
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.gitGo to the research_agent_gemini_interaction_api folder:
cd advanced_ai_agents/single_agent_apps/research_agent_gemini_interaction_apiInstall the required dependencies:
pip install -r requirements.txtGrab your Gemini API key from Google AI Studio.
Creating the App
Hereโs the code walkthrough in the research_planner_executor_agent.py file:
Import libraries and set up helper functions:
Streamlit for the UI
Google GenAI for Interactions API
Time and regex for progress tracking and task parsing
import streamlit as st, time, re
from google import genai
def get_text(outputs):
return "\n".join(o.text for o in (outputs or []) if hasattr(o, 'text') and o.text) or ""
def parse_tasks(text):
return [{"num": m.group(1), "text": m.group(2).strip().replace('\n', ' ')}
for m in re.finditer(r'^(\d+)[\.\)\-]\s*(.+?)(?=\n\d+[\.\)\-]|\n\n|\Z)', text, re.MULTILINE | re.DOTALL)]Create background execution handler:
Polls for completion of long-running tasks
Shows progress updates every 3 seconds
Handles timeout scenarios gracefully
def wait_for_completion(client, iid, timeout=300):
progress, status, elapsed = st.progress(0), st.empty(), 0
while elapsed < timeout:
interaction = client.interactions.get(iid)
if interaction.status != "in_progress":
progress.progress(100)
return interaction
elapsed += 3
progress.progress(min(90, int(elapsed/timeout*100)))
status.text(f"โณ {elapsed}s...")
time.sleep(3)
return client.interactions.get(iid)Initialize Streamlit app and session state:
Configure page layout and title
Set up session state variables for each phase
Maintains context across user interactions
st.set_page_config(page_title="Research Planner", page_icon="๐ฌ", layout="wide")
st.title("๐ฌ AI Research Planner & Executor Agent (Gemini Interactions API) โจ")
for k in ["plan_id", "plan_text", "tasks", "research_id", "research_text", "synthesis_text", "infographic"]:
if k not in st.session_state:
st.session_state[k] = [] if k == "tasks" else NoneCreate sidebar with API key input and instructions:
Secure API key entry
Reset functionality to clear all phases
Helpful workflow explanation
with st.sidebar:
api_key = st.text_input("๐ Google API Key", type="password")
if st.button("Reset"):
[setattr(st.session_state, k, [] if k == "tasks" else None)
for k in ["plan_id", "plan_text", "tasks", "research_id", "research_text", "synthesis_text", "infographic"]]
st.rerun()
st.markdown("""
### How It Works
1. **Plan** โ Gemini 3 Flash creates research tasks
2. **Select** โ Choose which tasks to research
3. **Research** โ Deep Research Agent investigates
4. **Synthesize** โ Gemini 3 Pro writes report + TL;DR infographic
Each phase chains via `previous_interaction_id` for context.
""")Initialize Gemini client:
Creates client with API key
Validates authentication before proceeding
client = genai.Client(api_key=api_key) if api_key else None
if not client:
st.info("๐ Enter API key to start")
st.stop()Phase 1: Generate Research Plan with Gemini 3 Flash:
Takes user's research goal as input
Uses Gemini 3 Flash for fast planning
Stores interaction ID for stateful continuation
Parses numbered tasks for selection
research_goal = st.text_area("๐ Research Goal", placeholder="e.g., Research B2B HR SaaS market in Germany")
if st.button("๐ Generate Plan", disabled=not research_goal, type="primary"):
with st.spinner("Planning..."):
try:
i = client.interactions.create(
model="gemini-3-flash-preview",
input=f"Create a numbered research plan for: {research_goal}\n\nFormat: 1. [Task] - [Details]\n\nInclude 5-8 specific tasks.",
tools=[{"type": "google_search"}],
store=True
)
st.session_state.plan_id = i.id
st.session_state.plan_text = get_text(i.outputs)
st.session_state.tasks = parse_tasks(get_text(i.outputs))
except Exception as e:
st.error(f"Error: {e}")Phase 2: Interactive Task Selection and Deep Research:
Displays checkboxes for each planned task
Users select which tasks to investigate
Passes selected tasks to Deep Research Agent
Uses
previous_interaction_idto maintain context from planning phaseExecutes in background with progress tracking
if st.session_state.plan_text:
st.divider()
st.subheader("๐ Select Tasks & Research")
selected = [f"{t['num']}. {t['text']}" for t in st.session_state.tasks
if st.checkbox(f"**{t['num']}.** {t['text']}", True, key=f"t{t['num']}")]
st.caption(f"โ
{len(selected)}/{len(st.session_state.tasks)} selected")
if st.button("๐ Start Deep Research", type="primary", disabled=not selected):
with st.spinner("Researching (2-5 min)..."):
try:
i = client.interactions.create(
agent="deep-research-pro-preview-12-2025",
input=f"Research these tasks thoroughly with sources:\n\n" + "\n\n".join(selected),
previous_interaction_id=st.session_state.plan_id,
background=True,
store=True
)
i = wait_for_completion(client, i.id)
st.session_state.research_id = i.id
st.session_state.research_text = get_text(i.outputs) or f"Status: {i.status}"
st.rerun()
except Exception as e:
st.error(f"Error: {e}")Display research results:
Shows comprehensive findings with citations
Formatted markdown output
if st.session_state.research_text:
st.divider()
st.subheader("๐ Research Results")
st.markdown(st.session_state.research_text)Phase 3: Synthesize Executive Report with Gemini 3 Pro:
Creates structured report with key sections
Uses
previous_interaction_idto access full research contextGenerates whiteboard-style infographic using Gemini 3 Pro Image
Combines text and visual synthesis
if st.session_state.research_id:
if st.button("๐ Generate Executive Report", type="primary"):
with st.spinner("Synthesizing report..."):
try:
i = client.interactions.create(
model="gemini-3-pro-preview",
input=f"Create executive report with Summary, Findings, Recommendations, Risks:\n\n{st.session_state.research_text}",
previous_interaction_id=st.session_state.research_id,
store=True
)
st.session_state.synthesis_text = get_text(i.outputs)
except Exception as e:
st.error(f"Error: {e}")
st.stop()
with st.spinner("Creating TL;DR infographic..."):
try:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=f"Create a whiteboard summary infographic for the following: {st.session_state.synthesis_text}"
)
for part in response.candidates[0].content.parts:
if hasattr(part, 'inline_data') and part.inline_data:
st.session_state.infographic = part.inline_data.data
break
except Exception as e:
st.warning(f"Infographic error: {e}")
st.rerun()Display final report with infographic and download option:
Shows TL;DR infographic at the top
Full executive report below
Markdown download functionality
if st.session_state.synthesis_text:
st.divider()
st.markdown("## ๐ Executive Report")
# TL;DR Infographic at the top
if st.session_state.infographic:
st.markdown("### ๐จ TL;DR")
st.image(st.session_state.infographic, use_container_width=True)
st.divider()
st.markdown(st.session_state.synthesis_text)
st.download_button("๐ฅ Download Report", st.session_state.synthesis_text, "research_report.md", "text/markdown")
st.divider()
st.caption("[Gemini Interactions API](https://ai.google.dev/gemini-api/docs/interactions)")Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder and run:
streamlit run research_planner_executor_agent.pyStreamlit will provide a local URL (typically
http://localhost:8501). Open this in your web browser.Enter your Google API key in the sidebar.
Try an example research goal:
"Research the B2B HR SaaS market in Germany - key players, regulations, pricing models"
"Analyze market opportunities for AI-powered customer support tools"
"Investigate the competitive landscape for sustainable packaging in e-commerce"
Click "Generate Plan" and watch as Gemini 3 Flash creates a structured research plan.
Select the tasks you want to investigate (or keep them all selected).
Click "Start Deep Research" and wait a couple of minutes as the Deep Research Agent conducts comprehensive web research.
Review the research results, then click "Generate Executive Report" to synthesize findings with an auto-generated infographic.
Download your complete research report as a markdown file!
Working Application Demo
Conclusion
You've just built a multi-phase AI Research Agent that demonstrates the cutting-edge capabilities of Google's Interactions API. This isn't just a proof-of-concept; it's a production-ready system that combines stateful conversation management, background execution, model mixing, and autonomous research capabilities.
What makes this powerful is the seamless orchestration: Gemini 3 Flash for fast planning, Deep Research Agent for thorough investigation, and Gemini 3 Pro for synthesis - all connected through stateful interactions that maintain full context without manual history management.
For further enhancements, consider:
Custom Data Sources: Add the File Search tool to let Deep Research analyze your private documents alongside public web data.
Multi-Report Comparison: Store multiple research reports and create comparative analyses across different topics or time periods.
Collaborative Research: Enable team members to review and refine research plans before execution, with version tracking.
Automated Scheduling: Set up periodic research tasks that automatically investigate evolving topics and alert you to significant changes.
Custom Formatting: Provide explicit output formatting instructions to structure reports for different audiences (technical, executive, investor).
Keep experimenting with different configurations and features to build more sophisticated AI applications.
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply