- unwind ai
- Posts
- Build a Multimodal AI Agent Design Team
Build a Multimodal AI Agent Design Team
Fully functional multi-agent app using Gemini 2.0 Flash (step-by-step instructions)
Multi-agent AI systems are a powerful paradigm where specialized agents collaborate to solve complex problems. Each agent has distinct capabilities and objectives with which we can create systems that are robust and truly useful. When we add multimodal capabilities like images, text, videos, and structured data – these systems become even more powerful.
In this tutorial, we’re building a Multi-Agent Design Team powered by Google's Gemini 2.0, where three specialized agents work in concert to provide comprehensive design insights.
Each agent uses Gemini's multimodal capabilities to understand design assets in different ways: analyzing visual hierarchies, evaluating interaction patterns, and contextualizing market positioning. The agents communicate and coordinate their findings to deliver unified, actionable insights.
We're using Agno, a framework specifically designed for orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration.
Also, Gemini 2.0 Flash brings impressive capabilities to our AI agents with multimodality, excellent performance, and fast inference.
What We’re Building
This application leverages multiple specialized AI agents to provide a comprehensive analysis of UI/UX designs of your product and your competitors, combining visual understanding, user experience evaluation, and market research insights.
Our Design Team:
Vision Agent - A visual analysis expert that identifies design elements, patterns, visual hierarchy, and evaluates composition fundamentals like color schemes and typography. It focuses on the technical aspects of visual design, analyzing everything from component relationships to overall brand consistency.
UX Agent - A user experience specialist that evaluates user flows, interaction patterns, and identifies usability issues and opportunities for improvement. It applies best practices in UX design and accessibility to provide actionable recommendations for enhancing user interaction.
Market Agent - A market research expert equipped with DuckDuckGo integration that analyzes market trends and competitor patterns while providing strategic positioning insights. This agent combines design analysis with market research to deliver context-aware recommendations and industry-specific guidance.
Features:
Integrated analysis across all three agent perspectives
Comparative analysis with competitor designs
Customizable focus areas for detailed insights
Context-aware analysis for better relevance
Real-time processing with progress indicators
Structured, actionable output
How the App Works
The application orchestrates the three agents through a structured analysis workflow:
Analysis Types and Agent Assignment:
Visual Design Analysis - Handled by the Vision Agent
Processes uploaded images
Analyzes specific elements like color schemes, typography, layout based on user-selected focus areas
Provides technical analysis of visual components
User Experience Analysis - Managed by the UX Agent
Evaluates the same images from a UX perspective
Focuses on user flows, interactions, and accessibility
Provides practical improvement suggestions
Market Analysis - Conducted by the Market Agent
Combines visual analysis with web research using DuckDuckGo
Provides market context and competitive insights
Suggests positioning strategies
Workflow Process:
Users upload design files and optional competitor designs
They select which types of analysis to run (can choose any combination of the three)
They can specify focus areas like Color Scheme, Typography, Layout, Navigation, Interactions, Accessibility, Branding, or Market Fit
Each selected analysis type triggers its respective agent
All agents have access to the same images but analyze them through their specialized lens
Results are compiled into a comprehensive report, with each agent's insights clearly separated
If multiple analysis types are selected, a combined "Key Takeaways" section shows how the different perspectives interconnect
Prerequisites
Before we begin, make sure you have the following:
Python installed on your machine (version 3.10 or higher is recommended)
Your Gemini API Key
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the ai_multimodal_design_agent folder:
cd advanced_ai_agents/multi_agent_apps/agent_teams/multimodal_design_agent_team
Install the required dependencies:
pip install -r requirements.txt
API Key: Visit Google AI Studio > Create or select a project > Generate an API key
Creating the Streamlit App
Let’s create our app. Create a new file design_agent_team.py
and add the following code:
Import required libraries and setup:
from agno.agent import Agent
from agno.models.google import Gemini
from agno.media import Image as AgnoImage
from agno.tools.duckduckgo import DuckDuckGoTools
import streamlit as st
from typing import List, Optional
import logging
from pathlib import Path
import tempfile
import os
Initialize the specialized AI agents:
def initialize_agents(api_key: str) -> tuple[Agent, Agent, Agent]:
try:
model = Gemini(id="gemini-2.0-flash-exp", api_key=api_key)
vision_agent = Agent(
model=model,
instructions=[
"You are a visual analysis expert that:",
"1. Identifies design elements, patterns, and visual hierarchy",
"2. Analyzes color schemes, typography, and layouts",
"3. Detects UI components and their relationships",
"4. Evaluates visual consistency and branding",
"Be specific and technical in your analysis"
],
markdown=True
)
ux_agent = Agent(
model=model,
instructions=[
"You are a UX analysis expert that:",
"1. Evaluates user flows and interaction patterns",
"2. Identifies usability issues and opportunities",
"3. Suggests UX improvements based on best practices",
"4. Analyzes accessibility and inclusive design",
"Focus on user-centric insights and practical improvements"
],
markdown=True
)
market_agent = Agent(
model=model,
tools=[DuckDuckGoTools()],
instructions=[
"You are a market research expert that:",
"1. Identifies market trends and competitor patterns",
"2. Analyzes similar products and features",
"3. Suggests market positioning and opportunities",
"4. Provides industry-specific insights",
"Focus on actionable market intelligence"
],
markdown=True
)
return vision_agent, ux_agent, market_agent
except Exception as e:
st.error(f"Error initializing agents: {str(e)}")
return None, None, None
Set up the Streamlit app and API configuration:
st.set_page_config(page_title="Multimodal AI Design Agent Team", layout="wide")
# Sidebar for API key input
with st.sidebar:
st.header("🔑 API Configuration")
if "api_key_input" not in st.session_state:
st.session_state.api_key_input = ""
api_key = st.text_input(
"Enter your Gemini API Key",
value=st.session_state.api_key_input,
type="password",
help="Get your API key from Google AI Studio",
key="api_key_widget"
)
if api_key != st.session_state.api_key_input:
st.session_state.api_key_input = api_key
if api_key:
st.success("API Key provided! ✅")
else:
st.warning("Please enter your API key to proceed")
Create the file upload interface:
st.header("📤 Upload Content")
col1, space, col2 = st.columns([1, 0.1, 1])
with col1:
design_files = st.file_uploader(
"Upload UI/UX Designs",
type=["jpg", "jpeg", "png"],
accept_multiple_files=True,
key="designs"
)
if design_files:
for file in design_files:
st.image(file, caption=file.name, use_container_width=True)
with col2:
competitor_files = st.file_uploader(
"Upload Competitor Designs (Optional)",
type=["jpg", "jpeg", "png"],
accept_multiple_files=True,
key="competitors"
)
if competitor_files:
for file in competitor_files:
st.image(file, caption=f"Competitor: {file.name}", use_container_width=True)
Configure analysis options:
st.header("🎯 Analysis Configuration")
analysis_types = st.multiselect(
"Select Analysis Types",
["Visual Design", "User Experience", "Market Analysis"],
default=["Visual Design"]
)
specific_elements = st.multiselect(
"Focus Areas",
["Color Scheme", "Typography", "Layout", "Navigation",
"Interactions", "Accessibility", "Branding", "Market Fit"]
)
context = st.text_area(
"Additional Context",
placeholder="Describe your product, target audience, or specific concerns..."
)
Implement image processing functionality:
def process_images(files):
processed_images = []
for file in files:
try:
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, f"temp_{file.name}")
with open(temp_path, "wb") as f:
f.write(file.getvalue())
agno_image = AgnoImage(filepath=Path(temp_path))
processed_images.append(agno_image)
except Exception as e:
logger.error(f"Error processing image {file.name}: {str(e)}")
continue
return processed_images
Execute analysis workflow:
if st.button("🚀 Run Analysis", type="primary"):
if design_files:
try:
st.header("📊 Analysis Results")
design_images = process_images(design_files)
competitor_images = process_images(competitor_files) if competitor_files else []
all_images = design_images + competitor_images
# Visual Design Analysis
if "Visual Design" in analysis_types and design_files:
with st.spinner("🎨 Analyzing visual design..."):
if all_images:
vision_prompt = f"""
Analyze these designs focusing on: {', '.join(specific_elements)}
Additional context: {context}
Provide specific insights about visual design elements.
Please format your response with clear headers and bullet points.
Focus on concrete observations and actionable insights.
"""
response = vision_agent.run(
message=vision_prompt,
images=all_images
)
st.subheader("🎨 Visual Design Analysis")
st.markdown(response.content)
Add UX and Market Analysis:
# UX Analysis
if "User Experience" in analysis_types:
with st.spinner("🔄 Analyzing user experience..."):
if all_images:
ux_prompt = f"""
Evaluate the user experience considering: {', '.join(specific_elements)}
Additional context: {context}
Focus on user flows, interactions, and accessibility.
Please format your response with clear headers and bullet points.
Focus on concrete observations and actionable improvements.
"""
response = ux_agent.run(
message=ux_prompt,
images=all_images
)
st.subheader("🔄 UX Analysis")
st.markdown(response.content)
# Market Analysis
if "Market Analysis" in analysis_types:
with st.spinner("📊 Conducting market analysis..."):
market_prompt = f"""
Analyze market positioning and trends based on these designs.
Context: {context}
Compare with competitor designs if provided.
Suggest market opportunities and positioning.
Please format your response with clear headers and bullet points.
Focus on concrete market insights and actionable recommendations.
"""
response = market_agent.run(
message=market_prompt,
images=all_images
)
st.subheader("📊 Market Analysis")
st.markdown(response.content)
except Exception as e:
st.error("An error occurred during analysis. Please check the logs for details.")
else:
st.warning("Please upload at least one design to analyze.")
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run design_agent_team.py
Streamlit will provide a local URL (typically http://localhost:8501).
Working Application Demo
Conclusion
And you've just built a powerful multi-agent design analysis team with multiple AI agents powered by Gemini 2.0. This tool can significantly streamline any design review process and provide valuable insights for improvement.
As you continue developing your AI agent team, consider these enhancements:
Adding support for video analysis using Gemini's video capabilities
Creating custom analysis templates for different design types
Adding export capabilities for reports
Keep experimenting and refining to build smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply