- unwind ai
- Posts
- Build a Multimodal AI Chatbot using Gemini Flash
Build a Multimodal AI Chatbot using Gemini Flash
Fully-functional LLM app in just 30 lines of Python Code (step-by-step instructions)
In this tutorial, you’ll learn to create a multimodal chatbot using Google’s Gemini Flash model with Streamlit. With just 30 lines of Python code, this chatbot can process both text and images as inputs.
Google’s Gemini Flash is tailored for fast and efficient multimodal interactions, balanced for speed and quality. Combining it with Streamlit provides a user-friendly interface for creating responsive interactive applications with ultra-low latency.
What We’re Building
This tutorial will demonstrate a multimodal chatbot using Google's Gemini Flash model. The chatbot allows users to interact with the model using both image and text inputs, providing lightning-fast results.
Features
Multimodal input: Users can upload images and enter text queries to interact with the chatbot.
Gemini Flash model: The chatbot leverages Google's powerful Gemini Flash model for generating responses.
Chat history: The application maintains a chat history, displaying the conversation between the user and the chatbot.
Time is running out for these game-changing AI courses. Curated by industry leaders, these programs focus on real-world AI implementations that matter. This is your last chance to join at a special rate!
Top Programs Closing Soon:
Building LLM Applications - Master end-to-end architecture & production deployments
Multi-agent LLM Systems - Design scalable agent architectures that actually work
Enterprise RAG Applications - Build high-performance retrieval at scale
AI Agents for GTM - Automate operations with production-ready patterns
Advanced RAG Systems - Achieve Google-level search relevance
AI Product Management - Learn from Google's GenAI leadership (includes $1,395 in tool credits)
Don't miss the cohorts that are already almost full. These aren't typical courses - they're your gateway to building production-ready AI systems.
Prerequisites
Before we begin, make sure you have:
Python installed on your machine (version 3.7 or higher is recommended)
A Google AI Studio account for your API key
Basic familiarity with Python programming
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the gemini_multimodal_chatbot folder:
cd advanced_tools_frameworks/gemini_multimodal_chatbot
Install the required dependencies:
pip install -r requirements.txt
Get your API Keys: Sign up for Google AI Studio account for an API key.
Creating the Streamlit App
Let’s create our app. Create a new file gemini_multimodal_chatbot.py
and add the following code:
Import necessary libraries:
• Streamlit for building the web app
• google.generativeai for accessing the Gemini Flash model
• PIL for image processing
import os
import streamlit as st
import google.generativeai as genai
from PIL import Image
Set up the Streamlit App:
• Set the page title and layout using 'st.set_page_config()'
• Add a title to the app using 'st.title()'
• Add a description for the app using 'st.caption()'
st.set_page_config(page_title="Multimodal Chatbot with Gemini Flash", layout="wide")
st.title("Multimodal Chatbot with Gemini Flash ⚡️")
st.caption("Chat with Google's Gemini Flash model using image and text input to get lightning fast results. 🌟")
Set up the Gemini Flash Model
• Create a text input for the user to enter their Google API key using 'st.text_input()'
• Configure the genai library with the API key
• Create an instance of the Gemini Flash model
# Get OpenAI API key from user
api_key = st.text_input("Enter Google API Key", type="password")
# Set up the Gemini model
genai.configure(api_key=api_key)
model = genai.GenerativeModel(model_name="gemini-1.5-flash-latest")
Initialize Chat History and Sidebar for Image Upload
• Initialize the chat history using Streamlit's session state
• Create sidebar for image upload using 'st.sidebar'
• Allow users to upload image using 'st.file_uploader()'
• Display the uploaded image using 'st.image()'
if api_key:
# Initialize the chat history
if "messages" not in st.session_state:
st.session_state.messages = []
# Sidebar for image upload
with st.sidebar:
st.title("Chat with Images")
uploaded_file = st.file_uploader("Upload an image...", type=["jpg", "jpeg", "png"])
if uploaded_file:
image = Image.open(uploaded_file)
st.image(image, caption='Uploaded Image', use_column_width=True)
Display Chat History and User Input Area
• Create a container for the chat history using 'st.container()'
• Display the chat history using 'http://st.chat_message()' and 'st.markdown()'
• Create a user input area using 'http://st.chat_input()'
chat_placeholder = st.container()
with chat_placeholder:
# Display the chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# User input area at the bottom
prompt = st.chat_input("What do you want to know?")
Generate Response and Display
• If a prompt is entered, add the user message to the chat history and display it
• If an image is uploaded, add it to the inputs list
• Generate a response using the Gemini Flash
• Display the assistant response in the chat message container
if prompt:
inputs = [prompt]
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat message container
with chat_placeholder:
with st.chat_message("user"):
st.markdown(prompt)
if uploaded_file:
inputs.append(image)
with st.spinner('Generating response...'):
# Generate response
response = model.generate_content(inputs)
# Display assistant response in chat message container
with chat_placeholder:
with st.chat_message("assistant"):
st.markdown(response.text)
if uploaded_file and not prompt:
st.warning("Please enter a text query to accompany the image.")
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run gemini_multimodal_chatbot.py
Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API keys, describe your movie idea, set other parameters, and watch your AI agent generate the script outline, actor suggestions, and more!
Working Application Demo
Conclusion
You’ve built a multimodal chatbot that combines image and text input capabilities using Google’s Gemini Flash model. This chatbot can interact in real-time, providing rich responses based on multimodal inputs.
For further enhancements, consider:
Implementing User Profiles: Save user preferences or past interactions to personalize responses and create a tailored experience.
Adaptive Response Speed Control: Implement options for users to choose between quick responses for general queries or more in-depth answers for complex questions.
Adding Support for File Uploads: Expand input types beyond images by allowing users to upload documents or PDFs.
Keep experimenting and refining to build even smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply