unwind ai
Posts
Phi-4 Beats GPT-4o in Math

Phi-4 Beats GPT-4o in Math

PLUS: Isolated sandboxes for AI-generated code, DeepSeek MoE vision model

Shubham Saboo & Gargi Gupta
December 16, 2024

Today’s top AI Highlights:

Microsoft releases Phi-4 model with excellent math capabilities
Create isolated sandboxes to run AI-generated code within 3 seconds
OpenAI brings Claude-like Projects to ChatGPT
DeepSeek releases new family of open-source MoE vision language models
Blazing fast AI Gateway - Route to 200+ LLMs, 50+ AI Guardrails with 1 API

& so much more!

Read time: 3 mins

AI Tutorials

Multi-agent AI systems are a powerful paradigm where specialized agents collaborate to solve complex problems. Each agent has distinct capabilities and objectives with which we can create systems that are robust and truly useful. When we add multimodal capabilities like images, text, videos, and structured data – these systems become even more powerful.

In this tutorial, we’re building a Multi-Agent Design Team powered by Google's new Gemini 2.0, where three specialized agents work in concert to provide comprehensive design insights.

Each agent uses Gemini's multimodal capabilities to understand design assets in different ways: analyzing visual hierarchies, evaluating interaction patterns, and contextualizing market positioning. The agents communicate and coordinate their findings to deliver unified, actionable insights.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Multimodal AI Agent Design Team

Fully functional multi-agent app using Gemini 2.0 Flash (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Phi-4 Beats GPT-4o on Math with Just 14B Parameters 🧮

Microsoft Research has just released Phi-4, a 14-billion parameter small model that excels in complex reasoning and math tasks. What sets it apart is its training approach, which heavily utilizes high-quality synthetic data and a specific method for selecting pivotal tokens during post-training.

The model outperforms much larger ones like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro on math problems. With Phi-4, you can get excellent performance with a relatively small size, which is a compelling mix for anyone wanting efficient on-device AI capabilities.

Key Highlights:

Reasoning Focused - Phi-4 is particularly strong in mathematical and logical reasoning, outperforming much larger models including Qwen 2.5 72B, Llama 3.3 70B, and GPT-4o.
Synthetic Data - Phi-4 moves beyond relying solely on organic web data. It uses synthetic data created using methods like multi-agent prompting, self-revision, and instruction reversal. This helped the model develop superior reasoning and problem-solving abilities.
Pivotal Token Search: A unique aspect of Phi-4's training process is the use of Pivotal Token Search (PTS). This technique identifies the critical tokens in a model's response that drive correctness, which allows the Direct Preference Optimization (DPO) process to focus preference learning on the most influential parts of the reasoning process.
Availability - Phi-4 is currently available on Azure AI Foundry under a Microsoft Research License Agreement (MSRLA) and will be available on Hugging Face next week.

Build Smarter, Faster: AI Voice Agents for Every Industry

Imagine having a calling assistant that operates 24/7, managing tasks like lead qualification and real-time booking. With Synthflow’s pre-built AI Agent templates, you don’t have to start from scratch. Explore our library of proven solutions tailored for industries like real estate, healthcare, and beyond. Want to go a step further? Create and publish your own templates to earn commissions while sharing your expertise!

Every AI Agent Can Now Have Its Own Secure Sandbox 🗃️

CodeSandbox, known for its in-browser coding environments, has launched the CodeSandbox SDK, a way to programmatically manage sandboxed environments. This means you can now spin up isolated Virtual Machines (VM) on demand through an API using the same technology that powers CodeSandbox itself.

The SDK provides fast cloning, checkpointing, and persistent file systems, extending the familiar CodeSandbox experience to a broader range of development workflows and AI integrations. This release gives you the tools to manage development environments, run code in isolated environments, or interpret AI-generated code, all with version-controlled file systems.

Key Highlights:

Programmatic MicroVM Control - The CodeSandbox SDK provides an API to create and manage isolated microVMs. This includes full Docker support, allowing you to customize the environment with specific dependencies and tools.
Rapid Cloning - The SDK allows for incredibly fast cloning of live VMs in approximately 3 seconds for quick iteration and creating multiple isolated instances for testing or parallel processing.
Data Persistence - It also features memory snapshot/restore functionality, which enables sandboxes to be paused and resumed later, along with persistent file storage backed by Git version control for a reliable development environment.
Versatile Applications - The SDK's use extends well beyond a typical sandbox. It's designed for building code interpreters, testing multiple AI agents in complete isolation, implementing CI/CD pipelines, or creating custom development platforms.
Simple Onboarding - You can start immediately with a free tier and a pay-as-you-go credit system. It is available via npm, includes comprehensive documentation, and comes with ready-to-use code examples. The SDK provides an easy-to-use API for file system management, running shells, setting up tasks, and cloning or starting a sandbox.

Quick Bites

OpenAI has released Projects in ChatGPT, quite like that in Claude.ai. Projects lets you organize your conversations, upload files, and set custom instructions for specific projects along with existing integrations like search and canvas. It allows you to keep related conversations, files, and instructions together in customizable folders. Projects are available to Plus, Pro, and Teams users, it’ll be expanded to free users soon and Enterprise/EDU users in early 2025.

Pika’s latest AI video generation model Pika 2.0 is here. The new model brings enhanced prompt alignment, motion rendering, and customizability of AI-generated videos. It also comes with a new powerful "Scene Ingredients" feature that lets you upload and customize your own characters, objects, and settings in the videos. It is available now on their platform and via API.

DeepSeek has released the DeepSeek-VL2 family of open-source vision-language Mixture-of-Experts models featuring three variants (Tiny/Small/Base with 1.0B/2.8B/4.5B activated parameters). The base model achieves impressive benchmark scores outperforming several closed models like Claude 3.5 Sonnet and GPT-4V while using fewer parameters. The complete model series is now available on Hugging Face and GitHub, with technical documentation and implementation guides in the works.

Deep learning pioneer Ilya Sutskever reflected on a decade of progress since his landmark 2014 NeurIPS paper on sequence-to-sequence learning, offering surprising insights into what’s been achieved—and what’s still to come. He also offered a glimpse into what a lot of people have been debating about lately in AI. Here’s what exactly he said:

“Pre-training will end because data is not growing as fast as compute and algorithms” and "The data is the 'fossil fuel' of AI."
We’re headed towards Superintelligence. Eventually, sooner or later, the following will be achieved. Those systems are actually going to be
- Agentic: It will be "agentic" in a real way, unlike current systems which are only very slightly agentic.
- Reasoners: These systems will be able to reason in a way that is not predictable, in contrast with how predictable the current models are.
- Understanding: These systems will understand things from limited data.
- Self-aware: These systems will become self-aware.
AI systems are going to be very complicated, have a lot of parameters, and are going to need much more complex incentive mechanisms to steer them.

Tools of the Trade

MarkItDown: Python library by Microsoft that converts various file formats (PDF, Office documents, images, and audio) into markdown. It offers a simple API for basic conversion as well as advanced features like LLM-powered image description with models like GPT-4.
aiide: Python framework to build LLM copilots with built-in support for chat memory (Pandas DataFrames), structured outputs, and tool integration. It provides a streamlined interface for handling OpenAI/LiteLLM models while managing conversation state and offering flexible JSON schema for tools and outputs.
AI Gateway: Open-source API router that connects applications to 250+ language, vision, audio, and image models through a unified interface, with built-in features for retries, fallbacks, and load balancing. It provides a lightweight middleware layer that lets you integrate any LLM in under 2 minutes using a consistent OpenAI-compatible API signature.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

"Data is oil" analogy is actually very apt, but not in the way it was intended. Just like the "peak oil" boogieman has been brandished for decades in order to promote a particular ideological agenda, so has the fear of reaching the end of available data been promulgated now. And in both cases it turns out that we are far from exhausting the existing reserves, and new insights and technologies are enabling us to go even farther than it has ever been suspected. ~
Bojan Tunguz
Ilya requiring 3-4 bodyguards is concerning for the future of AI! ~
Swaroop Mishra

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.