• unwind ai
  • Posts
  • Build a Multimodal AI Chatbot using Gemini Flash

Build a Multimodal AI Chatbot using Gemini Flash

Fully-functional LLM app in just 30 lines of Python Code (step-by-step instructions)

In this tutorial, you’ll learn to create a multimodal chatbot using Google’s Gemini Flash model with Streamlit. With just 30 lines of Python code, this chatbot can process both text and images as inputs.

Google’s Gemini Flash is tailored for fast and efficient multimodal interactions, balanced for speed and quality. Combining it with Streamlit provides a user-friendly interface for creating responsive interactive applications with ultra-low latency.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This tutorial will demonstrate a multimodal chatbot using Google's Gemini Flash model. The chatbot allows users to interact with the model using both image and text inputs, providing lightning-fast results.

Features

  • Multimodal input: Users can upload images and enter text queries to interact with the chatbot.

  • Gemini Flash model: The chatbot leverages Google's powerful Gemini Flash model for generating responses.

  • Chat history: The application maintains a chat history, displaying the conversation between the user and the chatbot.

Time is running out for these game-changing AI courses. Curated by industry leaders, these programs focus on real-world AI implementations that matter. This is your last chance to join at a special rate!

Top Programs Closing Soon:

  • Building LLM Applications - Master end-to-end architecture & production deployments

  • Multi-agent LLM Systems - Design scalable agent architectures that actually work

  • Enterprise RAG Applications - Build high-performance retrieval at scale

  • AI Agents for GTM - Automate operations with production-ready patterns

  • Advanced RAG Systems - Achieve Google-level search relevance

  • AI Product Management - Learn from Google's GenAI leadership (includes $1,395 in tool credits)

Don't miss the cohorts that are already almost full. These aren't typical courses - they're your gateway to building production-ready AI systems.

Prerequisites

Before we begin, make sure you have:

  1. Python installed on your machine (version 3.7 or higher is recommended)

  2. A Google AI Studio account for your API key

  3. Basic familiarity with Python programming

  4. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the gemini_multimodal_chatbot folder:

cd advanced_tools_frameworks/gemini_multimodal_chatbot
pip install -r requirements.txt
  1. Get your API Keys: Sign up for Google AI Studio account for an API key.

Creating the Streamlit App

Let’s create our app. Create a new file gemini_multimodal_chatbot.py and add the following code:

  1. Import necessary libraries:

    • Streamlit for building the web app
    • google.generativeai for accessing the Gemini Flash model
    • PIL for image processing

import os
import streamlit as st
import google.generativeai as genai
from PIL import Image
  1. Set up the Streamlit App:
    • Set the page title and layout using 'st.set_page_config()'
    • Add a title to the app using 'st.title()'
    • Add a description for the app using 'st.caption()'

st.set_page_config(page_title="Multimodal Chatbot with Gemini Flash", layout="wide")
st.title("Multimodal Chatbot with Gemini Flash ⚡️")
st.caption("Chat with Google's Gemini Flash model using image and text input to get lightning fast results. 🌟")
  1. Set up the Gemini Flash Model 
    • Create a text input for the user to enter their Google API key using 'st.text_input()'
    • Configure the genai library with the API key
    • Create an instance of the Gemini Flash model

# Get OpenAI API key from user
api_key = st.text_input("Enter Google API Key", type="password")

# Set up the Gemini model
genai.configure(api_key=api_key)
model = genai.GenerativeModel(model_name="gemini-1.5-flash-latest")
  1. Initialize Chat History and Sidebar for Image Upload 
    • Initialize the chat history using Streamlit's session state
    • Create sidebar for image upload using 'st.sidebar'
    • Allow users to upload image using 'st.file_uploader()'
    • Display the uploaded image using 'st.image()'

if api_key:
    # Initialize the chat history
    if "messages" not in st.session_state:
        st.session_state.messages = []

    # Sidebar for image upload
    with st.sidebar:
        st.title("Chat with Images")
        uploaded_file = st.file_uploader("Upload an image...", type=["jpg", "jpeg", "png"])
    
    if uploaded_file:
        image = Image.open(uploaded_file)
        st.image(image, caption='Uploaded Image', use_column_width=True)
  1. Display Chat History and User Input Area 
    • Create a container for the chat history using 'st.container()'
    • Display the chat history using 'http://st.chat_message()' and 'st.markdown()'
    • Create a user input area using 'http://st.chat_input()'

    chat_placeholder = st.container()

    with chat_placeholder:
        # Display the chat history
        for message in st.session_state.messages:
            with st.chat_message(message["role"]):
                st.markdown(message["content"])

    # User input area at the bottom
    prompt = st.chat_input("What do you want to know?")
  1. Generate Response and Display 
    • If a prompt is entered, add the user message to the chat history and display it
    • If an image is uploaded, add it to the inputs list
    • Generate a response using the Gemini Flash
    • Display the assistant response in the chat message container

    if prompt:
        inputs = [prompt]
        
        # Add user message to chat history
        st.session_state.messages.append({"role": "user", "content": prompt})
        # Display user message in chat message container
        with chat_placeholder:
            with st.chat_message("user"):
                st.markdown(prompt)
        
        if uploaded_file:
            inputs.append(image)

        with st.spinner('Generating response...'):
            # Generate response
            response = model.generate_content(inputs)
    
        # Display assistant response in chat message container
        with chat_placeholder:
            with st.chat_message("assistant"):
                st.markdown(response.text)

    if uploaded_file and not prompt:
        st.warning("Please enter a text query to accompany the image.")

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run gemini_multimodal_chatbot.py
  • Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API keys, describe your movie idea, set other parameters, and watch your AI agent generate the script outline, actor suggestions, and more!

Working Application Demo

Conclusion

You’ve built a multimodal chatbot that combines image and text input capabilities using Google’s Gemini Flash model. This chatbot can interact in real-time, providing rich responses based on multimodal inputs.

For further enhancements, consider:

  • Implementing User Profiles: Save user preferences or past interactions to personalize responses and create a tailored experience.

  • Adaptive Response Speed Control: Implement options for users to choose between quick responses for general queries or more in-depth answers for complex questions.

  • Adding Support for File Uploads: Expand input types beyond images by allowing users to upload documents or PDFs.

Keep experimenting and refining to build even smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Bonus worth $50 💵💰

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Reply

or to participate.