• unwind ai
  • Posts
  • First-ever Interactive Multimodal AI Model

First-ever Interactive Multimodal AI Model

PLUS: Opensource ChatGPT's Code Interpreter, AI writes 90,000 lines of code in 2 hrs

In partnership with

For Those Who Seek Unbiased News.

Be informed with 1440! Join 3.5 million readers who enjoy our daily, factual news updates. We compile insights from over 100 sources, offering a comprehensive look at politics, global events, business, and culture in just 5 minutes. Free from bias and political spin, get your news straight.

Today’s top AI Highlights:

  1. Opensource interactive multimodal AI model reaches close to GPT-4o

  2. AI writes 90,000 lines of codebase autonomously with 92% accuracy

  3. Google’s NotebookLM adds audio and YouTube support

  4. Sam Altman might get $150 billion shares of the new for-profit OpenAI

  5. Local opensource alternative to ChatGPT's Code Interpreter

& so much more!

Read time: 3 mins

AI Tutorials

Meta’s new opensource models, Llama 3.2, are all the rage. But have you started building with the models yet? If not, now’s the perfect time to dive in.

In this tutorial, we’ll show you how to build a simple yet powerful PDF Chat Assistant using Llama 3.2 and RAG. By the end, you’ll be able to upload PDFs, ask questions, and get highly accurate answers while your app is running locally, absolutely free and without internet.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

Allen Institute for AI has released Molmo, a family of open-source, state-of-the-art multimodal AI models to power interactive AI apps. Molmo models go beyond standard image captioning and visual question answering. They have a "pointing" capability to interact directly with elements within images and on-screen interfaces.

This capability allows Molmo to interact with specific visual elements, making it ideal for building next-gen AI agents that can perform interactive actions in digital and real-world environments. Four models are available now, ranging from MolmoE-1B to the powerful Molmo-72B.

Key Highlights:

  1. Model Performance - MolmoE-1B (1.5B parameters) is an MoE model optimized for resource-constrained environments. The two 7B models (Molmo-7B-O and Molmo-7B-D) strike a balance between size and performance. Molmo-72B leads in benchmarks, competing and even outperforming GPT-4V, Llama 3.2 90B, and Claude 3.5 Sonnet in some benchmarks.

  2. Interactive Pointing - Molmo's pointing capability enables a wide range of interactive applications like object identification, data extraction from tables and documents, visual counting and analysis, and even interacting with user interfaces.

  3. Open Source - All models are open-source, with the PixMo dataset, training code, evaluations, and checkpoints to follow. You can download the models from Hugging Face. You can also explore Molmo-7B-D via this free demo at molmo.allenai.org

  4. Easy Integration - Integrating Molmo into your projects is straightforward using the transformers library. Ready-to-use code examples are provided in the model card for you to quickly get started.

CodeMaker AI, a platform for AI-assisted software development, has successfully recreated a 90,000-line Spring Boot codebase with 91-92% accuracy. This involved processing 3,251 files and generating the corresponding code in just 1 hour and 42 minutes, for $265.73. The core of this success lies in CodeMaker AI's custom fine-tuning pipeline, which trains their AI model on the entire target codebase. Check out the generated artifact. This demonstrates moving beyond simple code generation towards AI-driven generation of large-scale code structures.

Key Highlights:

  1. Replicable Fine-Tuning - You can replicate CodeMaker AI's approach, training the model on the entire codebase, for your own projects using CodeMaker AI's platform. This enables highly accurate, project-specific code generation. Their documentation indicates that fine-tuning costs $15 per million tokens processed.

  2. Autonomous Code Generation - Generating 90,000 lines of code autonomously represents a significant step towards automating complex coding tasks, including generating entire codebases or migrating existing code.

  3. Cost and Time Efficiency - The speed (under two hours) and cost ($265.73) showcase potential savings for tasks like repetitive coding, boilerplate generation, or initial project setup.

  4. Workflow Integration - You can explore integrating this into your workflows for tasks like code completion, bug fixing, prototyping, and generating initial code structures. CodeMaker AI offers a platform and API to perform similar fine-tuning and code generation on your own projects. Refer to their documentation for details on repository size limits (1GB) and source code size limits (100MB) for fine-tuning.

Quick Bites

Google has unveiled AlphaChip AI model for designing advanced chip layouts, optimizing performance and speed. It is already being used for Google’s custom AI accelerator, the TPUs, and has been adopted by external organizations. Here’s a pre-trained checkpoint.

Google’s notetaking and research platform NotebookLM has become a rage lately, specially after it released its text-to-podcast feature. You can now add public YouTube URLs and audio files directly into NotebookLM for Q&A and enhanced learning.

OpenAI is restructuring into a for-profit benefit corporation, ending its non-profit board's control, making it more appealing to investors. CEO Sam Altman will reportedly receive 7% equity worth $150 billion in the new company, and the non-profit will still own a minority stake.

OpenAI has released a new multimodal moderation model based on GPT-4o, which improves detection of harmful text and images in various languages. Available via the Moderation API, the model offers enhanced accuracy and control, supporting categories like violence, self-harm, and illicit content.

Tools of the Trade

  1. gptme: Interact with an LLM assistant directly in your terminal in a chat-style interface. With tools for the assistant to run shell commands, execute code, read/write files, and more. It supports various LLMs for development, data analysis, and prototyping.

  2. Octoparse AI: A no-code platform for building custom workflows and automating tasks. It offers ready-made apps for lead prospecting, data scraping, and more.

  3. Flowvoice AI: A voice-to-text tool for Mac that turns spoken input into clear, structured writing, making typing 3x faster and more efficient. It helps with emails, documents, and AI prompts using voice commands.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. vcs are going to be talking to mira like hitting on a widow at a funeral

    “i’m so so sorry, this must be so hard for you. really sorry you’re going through this. any plans to start something of your own? probably too soon but lmk, would love to lead the round. really sorry again” ~
    sophie

  2. Every tech company *cough* Meta *cough* needs to accept that YouTube won and stream every event there. Put it on your site if you want, but put it on YouTube too. ~
    Ben Thompson

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵 

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - Twitter | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.