unwind ai
Posts
GPT-5 is Releasing this December

GPT-5 is Releasing this December

PLUS: Claude with Code Interpreter, Quantized Llama 3.2 models

Shubham Saboo & Gargi Gupta
October 25, 2024

Today’s top AI Highlights:

Meta's Llama gets lean and fast - Quantized Llama 3.2 1B and 3B models
Run LLMs faster with less memory on Any device
Claude now has a built-in Code Interpreter
OpenAI plans to release Orion (GPT-5) in December
Self-hosted alternative to AI app builders

& so much more!

Read time: 3 mins

AI Tutorials

RAG is becoming a game-changer for applications that need accurate information from large datasets. As developers, we know the value of building tools that can search documents and provide relevant answers quickly. Today, we’ll take that one step further.

In this tutorial, we’ll walk you through building a production-ready RAG service using Claude 3.5 Sonnet and Ragie.ai, integrated into a clean, user-friendly Streamlit interface. With less than 50 lines of Python code, you’ll create a system that retrieves and queries documents—ready for real-world use.

What is Ragie.ai?

Ragie.ai is a fully managed RAG-as-a-Service for developers. It offers connectors for services like Google Drive, Notion, and Confluence, along with APIs for document upload and retrieval. It handles the entire pipeline—from chunking to hybrid keyword and semantic searches—so you can start with minimal setup.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build and Deploy RAG-as-a-service

RAG-as-a-service with Claude 3.5 Sonnet in less than 50 lines of Python Code (step-by-step instructions)

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

Build Apps with On-device AI using Quantized Llama 3.2 🦙📲

Meta has released quantized versions of Llama 3.2 1B and 3B models, designed for fast on-device AI processing, especially on mobile. These models offer a smaller footprint and faster inference without sacrificing quality or safety, addressing the growing demand for mobile AI deployment. You can now leverage these models for various applications, including those with limited resources. This release provides both quantization-aware training and post-training quantization options.

Key Highlights:

Faster Performance - Experience up to 4x faster inference speeds, with decoding 2.5x faster and prefill 4.2x faster, allowing for snappier mobile AI experiences.
Reduced Resource Requirements - Benefit from a 56% reduction in model size and a 41% reduction in memory usage, enabling deployment on devices with limited storage and memory.
Choice of Quantization Methods - Choose between QLoRA (Quantization-Aware Training with LoRA) for optimal accuracy or SpinQuant (post-training quantization) for portability and ease of use with existing fine-tuned models.
Open Source - Download the quantized Llama 3.2 models from llama.com and Hugging Face, integrate them with PyTorch's ExecuTorch framework, and leverage optimized Arm CPU support via the Kleidi AI library for seamless deployment.

Blazing Fast Inference with On-the-Fly Quantization 🚀

Mistral.rs is a powerful opensource library for blazing fast LLM inference across various devices and model architectures. Mistral.rs offers extensive quantization options, supporting 2, 3, 4, 5, 6, and 8-bit quantization. This helps to significantly reduce the model size and accelerate inference speed while maintaining reasonable accuracy.

It supports a wide range of architectures, including Llama 3, Phi 3 Vision, and Gemma models, with both text and vision-based models. You can run these models using Python, Rust, or a lightweight OpenAI-compatible HTTP server.

Key Highlights:

In-situ Quantization (ISQ) - A standout feature of Mistral.rs is its ISQ capability, which allows you to download models directly from Hugging Face and automatically quantize them, eliminating the need for manual conversion and making it easy to handle large models with reduced memory usage.
Extensive Model Support - Mistral.rs supports a wide range of models, including those with LoRA and X-LoRA adapters, as well as vision models like Phi-3 Vision and Llama 3.2 Vision. You can define custom topologies to manage quantization and device placement.
Optimized for Diverse Hardware - You can leverage multiple accelerators, including NVIDIA CUDA, Apple Metal, and Intel MKL for fast and efficient inference on a wide range of systems.
Flexible Integration Options - Mistral.rs offers Python bindings, a Rust async API, and an OpenAI-compatible HTTP server, giving developers multiple ways to integrate it into existing workflows. Features like dynamic LoRA adapter activation further enhance its adaptability.
Get Started Quickly - You can install Mistral.rs using Python pip or build it from source with Rust. Pre-built Docker containers and binaries are available for faster setup.

Quick Bites

Claude.ai now has a built-in analysis tool that enables it to write and run JavaScript code. Claude can now process data, conduct analysis, and produce real-time insights.

It can clean, explore, and analyze data from CSV files, acting like a real data analyst.
The built-in sandbox enables accurate, reproducible answers through code execution.
This upgrade builds on Claude 3.5 Sonnet’s advanced coding capabilities, improving both reasoning and precision.

OpenAI is reportedly planning to release GPT-4’s successor Orion in December this year. Unlike GPT-4o and o1, Orion won’t initially be released widely through ChatGPT. OpenAI is planning to grant access first to companies it works closely with to build their own products and features. Orion or GPT-5 is potentially up to 100x more powerful than GPT-4.

If you thought that Anthropic is the first to release a model that can interact autonomously with GUIs, Microsoft beat them to it. Released last month, OmniParser is a screen-parsing tool enabling vision-language models like GPT-4V to recognize clickable elements, understand their functions, and act on interfaces. OmniParser works across PC and mobile platforms without needing HTML or view hierarchy data. The code is available here.

Google has open-sourced its SynthID watermarking tool that identifies and embeds imperceptible digital watermarks into AI-generated text, audio, images, and videos. It is a part of the Responsible Generative AI Toolkit released in beta.

Cohere For AI has released Aya Expanse family of multilingual models available in 8B and 32B parameters, outperforming other open-weight models across 23 languages. These models are now accessible on Cohere’s platform, Kaggle and Hugging Face.

Tools of the Trade

Backprop: Cloud platform offering on-demand GPU instances for machine learning and data science. It simplifies model deployment with pre-configured environments for frameworks like PyTorch and TensorFlow
Srcbook: TypeScript-centric app development platform. It allows you to create and iterate on web apps incredibly fast using AI as a pair-programmer. It can create or edit web apps, and also write and execute backend code through an interactive notebook interface.
Augment Code: AI coding assistant designed for software teams, deeply integrating with your codebase to provide relevant suggestions, completions, and automated edits.
MobileBoost: Automates mobile app testing across iOS, Android, and web platforms. With a no-code editor and AI-driven execution, it handles unexpected UI changes, reduces maintenance, and seamlessly integrates into CI/CD pipelines.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

Anthropic seems to be losing the game. Opus was much awaited particularly after o1 but they ended up releasing a marginal update of the current model and then releasing a marketing gimmick called computer use as a distraction. ~
harambe_musk
Imagine you’ve created the most imaginative LLM human could ever train. Where can people find your model? At the bottom of the LLM leaderboard because it hallucinates too much. ~
cocktail peanut

Meme of the Day

> new agent framework trending on GitHub
> looks inside
> it’s a wrapper over another agent framework

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for a limited time only!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.