• unwind ai
  • Posts
  • Meta AI Releases Opensource Multimodal AI Models

Meta AI Releases Opensource Multimodal AI Models

PLUS: AI development platform for RAG and AI Agents, Meta AI voice mode for Free

Today’s top AI Highlights:

  1. Meta releases Llama 3.2 with two new multimodal models

  2. Build, evaluate and deploy LLM apps with RAG and AI Agents to production

  3. OpenAI Advanced Voice Mode is here but Meta’s voice AI will be FREE

  4. Meta’s new AI video dubbing tool will sync your lips with the dubbed language

  5. Navigate and refactor your code with an IDE built on a canvas

& so much more!

Read time: 3 mins

AI Tutorials

The tech world is evolving so fast that staying up-to-speed is overwhelming. How about multiple AI agents doing that research for you from the most dynamic yet cluttered source of top tech stories, Hacker News?

In this tutorial, we’ll show you how to build an AI-powered multi-agent researcher to research top stories on HackerNews, generating blog posts, reports, and social media content, all autonomously and in just 15 lines of Python code.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

Vellum AI is a comprehensive AI development platform for building, testing, and deploying LLM applications with RAG and AI Agents. It stands out as a super intuitive workflow builder that allows you to create complex AI-driven processes by chaining logic, data inputs, APIs, and dynamic prompts together. Leverage Vellum to evaluate prompts and models, integrate them with agents using RAG and APIs, then deploy and continuously improve in production.

Key Highlights:

  1. Multi-Agent System - Vellum supports a multi-agent architecture that allows orchestration of various AI tasks, like content generation or SEO research. This system can access the internet, utilize memory, and evaluate outputs, enhancing the overall functionality of AI apps.

  2. Document Retrieval and Semantic Search - The platform offers sophisticated tools for uploading, filtering, and searching proprietary data to incorporate domain-specific knowledge into your AI apps and enhance the RAG process.

  3. Advanced Prompt Engineering - Their UI offers side-by-side comparisons of multiple prompts, parameters, and models across various test cases so you can fine-tune your AI responses with precision.

  4. Robust Evaluation Framework - The platform includes built-in tools for quantitative assessment of prompt and model performance for making data-driven decisions.

  5. User-Friendly Interface - Designed to empower users without deep technical expertise, it simplifies the experimentation process with intuitive tools for easy configuration and deployment of AI systems.

  6. Free Webinar - Vellum AI is hosting a Free webinar to help you seamlessly transition your AI applications from prototype to production. You’ll learn how to maintain quality in production, version controlling for smoother iterations, post-production evaluation, and capturing user feedback. Grab your seat now to learn from the best!

Every time Meta releases a new iteration of Llama models, it’s an opensource party. Just 2 months after Llama 3.1, Meta yesterday released Llama 3.2, which includes small and medium-sized multimodal LLMs (11B and 90B), and lightweight, text-only models (1B and 3B) for running on mobile and edge devices. Meta has opensourced both base and instruction-tuned versions. Llama 3.2 11B and 90B vision models are drop-in replacements for their corresponding text models. Competing head on with proprietary models, Llama 3.2 11B beats Claude 3 Haiku across all multimodal benchmarks, and Llama 3.2 90B competes strongly with GPT-4o mini. The models are available to download on llama.com and Hugging Face.

Key Highlights:

  1. Vision Usecases - The multimodal Llama 3.2 models support image reasoning use cases, such as document-level understanding including charts and graphs, captioning of images, etc.

  2. SLMs Capabilities - With a 128k tokens context window, the lightweight 1B and 3B models are highly capable of multilingual text generation and tool-calling abilities. These are great for building personalized, on-device agentic applications where data doesn’t leave the device.

  3. Performance - The 3B model outperforms the Gemma 2 2.6B and Phi 3.5-mini models on tasks such as following instructions, summarization, and tool-use, while the 1B is competitive with Gemma.

  4. Llama Stack - Meta has also released Llama Stack, their new API for building agentic, RAG and conversational AI apps using Llama models. It offers a standardized interface for customizing Llama models and deploying them across diverse environments. You can mix and match API providers, including cloud services and local code, for a flexible and adaptable development experience.

  5. Run locally - As the lightweight models are perfect for using AI locally, you can download and use these models privately using Ollama and LM Studio.

  6. Fast Inference - High-performance API providers like Groq, Together AI, and Fireworks AI have made Llama 3.2 models available. You can try them out today in a free preview at a blazing-fast speed.

Quick Bites

Yesterday’s Meta Connect 2024 keynote was more like an AI celebration. Integrating AI deeply into all the products, Meta released a slew of innovative AI features and hardware that you wouldn’t want to miss:

  1. Meta unveiled a fully functioning prototype of Orion glasses, its first augmented reality glasses, blending regular eyewear design with immersive AR capabilities. It has tiny projectors with holographic lights so you can use the physical world as your canvas, placing 2D and 3D content and experiences anywhere you want. They are currently available to Meta employees and select partners for testing.


  2. The Ray Ban Meta Glasses are getting some new cool AI features, including real-time speech translation, memory aids and hands-free messaging on WhatsApp and Messenger via the glasses. Next time you fly somewhere, you don’t have to sweat forgetting where you parked at the airport — your glasses can remember your spot.

  3. You can now use your voice to talk to Meta AI on Messenger, Facebook, WhatsApp and Instagram DM, and it’ll respond back to you out loud. Choose from the AI voices of Awkwafina, Dame Judi Dench, John Cena, Keegan Michael Key, and Kristen Bell.

  4. You can easily edit images within Meta AI. Just give it the image and tell it what you want added, removed or changed in the photo. It’ll understand what it’s looking at and do it for you.

  5. This was one of our favorites! Meta is testing a translation tool that will automatically translate the audio of Reels. Not only this, it’ll also sync your lips with the dubbed language to make it relatable for your viewers.

  6. These are probably the most affordable mixed-reality headsets ever. Meta unveiled Meta Quest 3S, a headset with the same mixed reality capabilities and fast performance as Meta Quest 3, but at a price point of $300. They have also dropped the price of the 512GB Meta Quest 3 from $650 to just $500.

OpenAI’s leadership exodus continues as their CTO Mira Murati has announced that she’s leaving OpenAI “to create the time and space for my own exploration.” In her role as OpenAI’s CTO, she’s spearheaded the release of major AI products like ChatGPT, GPT-4o series, DALL.E, Advanced Voice Mode, and so many more.

From one glance, this follows resignations from many key people of OpenAI like Ilya Sutskever and Jan Leike. And this is also right before a massive VC funding round of reportedly $6.5 billion. We aren’t really sure what to make out of it…what do you think? Tell us in the comments!

Tools of the Trade

  1. Haystack: A canvas-based code editor that automates code navigation and refactoring. It visualizes your code as a 2D graph and includes an AI assistant to predict and streamline edits across files.

  1. Napkins.dev: An open source wireframe to app tool powered by Llama 3.2 multimodal models. Upload a screenshot of a simple site/design & get code. 100% free and open source.

  2. Verbi: A modular voice assistant that lets you easily switch between models for transcription, response generation, and text-to-speech for testing and comparing SOTA models. It supports various APIs and local models for flexible testing and setup.

  3. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. Chances are very high that the age of AI will not be ruled by any of today's companies, OpenAI included. ~
    Pedro Domingos

  2. my father-in-law is a deepmind researcher. he’s extraordinarily talented. we were fireside one day, playing around with gpt-4o voice. i asked him how much it was cost for google to build it today. i’ll never forget his answer:

    we can’t. we don’t know how. ~
    Aidan McLau

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵 

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - Twitter | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.