unwind ai
Posts
Llama 3.2 Locally in Your Browser

Llama 3.2 Locally in Your Browser

PLUS: Opensource o1 with visual reasoning, Liquid LLMs beat Llama 3.2 models

Shubham Saboo & Gargi Gupta
October 01, 2024

Take Your AI Systems to Production, Faster

Vellum, the ultimate AI development platform, is holding a free webinar series to help you take your AI idea to a functional prototype and to final production.

Learn the strategies and framework needed to seamlessly transition your LLM applications from prototype to production.

Geek out on:

Maintaining quality in prod for consistent AI performance
The version control system you need for smooth iterations
Tips for capturing user feedback in prod
Post-prod evaluation strategies

The final session is on October 2. Register now before the spots get filled!

Today’s top AI Highlights:

Opensource o1 with visual tracking of AI reasoning in real-time
Run high-performance AI locally in your browser with WebLLM
MIT spin-off Liquid AI releases its first series of Liquid Foundation Models
Build serverless autonomous AI agents with agentic tools and memory
A fork of Pear AI, which is a fork of Continue, which is a fork of VSCode

& so much more!

Read time: 3 mins

AI Tutorials

Meta’s new Llama 3.2 models are here, offering incredible advancements in speed and accuracy for their size. Do you want to fine-tune the models but are worried about the complexity and cost? Look no further!

In this blog post, we’ll walk you through finetuning Llama 3.2 models (1B and 3B) using Unsloth AI and Low-Rank Adaptation (LoRA) for efficient tuning in just 30 lines of Python code. You can use your own dataset.

With Unsloth, the process is faster than ever—2x faster, in fact. And the best part? You can finetune Llama 3.2 for free on Google Colab.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Fine-tune Llama 3.2 for Free in 30 Lines of Python Code

Fine-tune Llama 3.2 in Google Colab for free (step-by-step instructions with Code)

www.theunwindai.com/p/fine-tune-llama-3-2-for-free-in-30-lines-of-python-code

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

Watch AI Reasoning Unfold in Real-Time 🧐

Demo

OpenAI’s o1 models are great at complex solving problems, but how do they get there? Wouldn’t it be better if you could watch its thought process in real-time?

Show-Me is an opensource alternative to traditional reasoning models like o1. This tool is designed to be fully transparent so you can visually track how the model solves problems in real-time. Its dynamic graph interface breaks down each reasoning step, helping you follow the logic clearly. Show-Me also corrects its mistakes automatically using a self-healing process, making it both accurate and reliable for complex reasoning tasks.

Key Highlights:

Visual Reasoning Graph - Every step in the reasoning process is mapped out visually, helping you track the thought process in real time. The dynamic graph evolves with each new piece of information, making it easier to debug and analyze complex problems.
Self-Healing Mechanism - Show-Me runs automated checks at every stage. When it detects errors, the system corrects them on its own and refines answers continuously. This enhances trust in the accuracy of the results without constant manual oversight.
Python Code Execution - The tool includes Python interpreter capabilities, which can generate and execute code during reasoning. This is particularly useful for those working with tasks that involve data manipulation or logical problem-solving that needs coding input.
Infinite Task Decomposition - Show-Me recursively breaks down complex problems into smaller, more manageable sub-tasks. This helps ensure that even intricate logic puzzles are addressed thoroughly.

Here’s the code. Check it out!

High-performance In-browser LLM Inference Engine 🏃‍♀️

Why rely on the cloud when you can run AI models locally in your browser? WebLLM is a new in-browser LLM inference engine that leverages WebGPU for efficient hardware acceleration, making it possible to run AI models locally in a browser. No server-side support is needed, ensuring both privacy and cost savings. WebLLM is fully compatible with the OpenAI API, which means you can integrate it seamlessly into your existing AI apps. With support for a wide range of AI models and easy integration through package managers, WebLLM is a versatile solution for those looking to run language models in-browser.

Key Highlights:

In-Browser Inference - Perform high-performance AI operations directly in web browsers using WebGPU acceleration, eliminating the need for external servers.
OpenAI API Compatibility - Full compatibility with the OpenAI API, supporting streaming, JSON-mode, and function-calling (WIP), enabling developers to switch between local and cloud-based models effortlessly.
Model Flexibility - Native support for popular models such as the latest Llama 3.1 and 3.2, Mistral, and Qwen 2.5, along with the option to deploy custom models using MLC format for personalized needs.
Seamless Integration - Quick integration with web projects using NPM, Yarn, or CDN, with built-in support for workers and Chrome extensions for enhanced performance and extended functionality. Try the demo here.

Quick Bites

The next wave of AI innovation is all about compact models. Liquid AI, an MIT spin-off building foundation AI models, has just released its first series of “liquid foundation models”. The 1B, 3B, and 40B models achieve state-of-the-art performance while maintaining a smaller memory footprint and more efficient inference.

These LFMs are general-purpose AI models that can be used to model any kind of sequential data, including video, audio, text, time series, and signals.
LFMs can automatically optimize architectures for a specific platform (e.g., Apple, Qualcomm, Cerebras, and AMD) or match given parameter requirements and inference cache size.
All models have a 32k context window. The 1B and 3B models deliver impressive performance, outperforming Llama 3.2, Gemma 2, and Phi 1.5 models in their respective size classes across benchmarks like MMLU-Pro and Hellaswag.
The 40B model is an MoE model with 12B active parameters. It outperforms Llama 3.1 70B, Mixtral, and Jamba 1.5 across various benchmarks.
The models aren’t opensourced. You can try them today on Liquid Playground, Lambda (Chat UI and API), Perplexity Labs, and soon on Cerebras Inference.

And here’s another small language model, by AMD. AMD’s very first AMD-135M SLM with Speculative Decoding was trained from scratch on AMD Instinct MI250 accelerators, utilizing 690B tokens and divided into two models: AMD-Llama-135M and AMD-Llama-135M-code. The training code, dataset and weights for this model are opensourced.

Perplexity is soon launching its AI search app for Mac, offering features like voice and text search, cited sources, and a personal library. Just ⌘ + ⇧ + P to ask Perplexity anything. You can pre-order now.

Tools of the Trade

BaseAI: Locally build AI agents, memory (RAG), and then deploy them to a highly scalable API. It is the first AI framework for the Web. It's composable by design and offers a simple API to build and deploy any AI agents (AI features). Here's a simplified workflow:
BlueberryAI: Opensource AI code editor forked from VSCode and Continue and PearAI to help with code understanding and reduce coding effort. It understands your codebase locally, so you can ask questions and get help without sending your code.
RAGApp v0.1: A no-code tool to build multi-agent applications. It’s as simple as OpenAI’s custom GPTs, you just need to configure roles and system prompts for each agent. It can be deployed using Docker, supports local and hosted AI models, and provides an admin interface for easy setup.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

the more you work on frontier ai, the more you realize that grady, marcus, and yann value current tool correctly
ai cels valiantly gaslight, but the world still turns and gdp hasn’t skyrocketed. of course, it’s gonna be so over soon tho
it’s still January 2020 ~
Aidan McLau
openai are working so hard on a podcast feature rn. ~
Strawberry man 🍓🍓🍓

Meme of the Day

fork of vscode
fork of a fork of vscode
open source cursor
ai powered code editor

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - Twitter | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.