- unwind ai
- Posts
- Inference Tokens in Wholesale
Inference Tokens in Wholesale
PLUS: Distributed AI training over the internet, Gradio AI Agent
Synthflow: Build AI voice assistants to manage inbound and outbound calls
Keep your business on 24/7 with genAI. Synthflow’s simple no-code builder lets you set up human-sounding AI voice assistants that can handle call center tasks: real-time appointment booking, lead qualification, handling FAQ, transferring between agents, and more. White label included. Pay as low as $0.08 per minute of conversation. CRM Integrations with Hubspot, Gohighlevel, Zoho, etc. Start for free or let us build your AI receptionist.
Today’s top AI Highlights:
Llama 3.1 inference 50-90% cheaper with LLM Inference wholesaler
You don’t need a Supercomputer to train AI models
Apple Intelligence is out in public beta
Build multi-agent AI applications with no-code
Gradio AI Agent to build, deploy and optimize Gradio apps without a single code
& so much more!
Read time: 3 mins
AI Tutorials
Gemini with Gmail looks great! But is it worth $20 a month?
In just 30 lines of Python, you can build an AI assistant that connects with your Gmail inbox, retrieves email content, and answers questions about your emails using RAG.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
A new service called inference.net is providing a cost-effective way for you to access LLM inference. Calling themselves “a wholesaler for LLM inference tokens”, they provide access to models like Llama 3.1 (diffusion models coming soon) via both batch and streaming APIs. They claim to offer prices that are significantly lower (50-90%) compared to established providers like Together.ai and Groq.
Key Highlights:
How they do it - Data centers have underutilized capacity that most orchestration software are not capable of using. inference.net uses custom scheduling software to capture small, unused time slots across data centers, turning these previously unsellable compute fragments into valuable AI inference time.
Cost efficient - Provides up to 100 billion tokens per day for LLM inference at a 50-90% discount compared to other providers.
Fast inference - Offers <300ms time to first token (TTFT) and >100 tokens per second throughput, ensuring fast and scalable performance.
Uptime - Maintains 99.9% uptime across multiple data centers, primarily located in North America and Europe.
Apply for API - Fill out the form here to receive an API key. They also have a grant program for researchers working on projects that require large amounts of batch inference, where jobs can be queued and executed asynchronously in the background.
If training AI models feels out of reach due to bandwidth and expensive hardware, here’s the solution! Nous Research has released a report on their DisTrO family of distributed optimizers that could dramatically reduce the bandwidth needed to train LLMs and diffusion models across multiple GPUs, even over slower internet connections.
DisTrO cuts bandwidth requirements for multi-GPU training by 100s of times, meaning you can pre-train and fine-tune large models using standard internet connections on consumer-grade hardware.
Key Highlights:
Bandwidth efficiency - DisTrO reduces inter-GPU communication requirements by up to 857x during pre-training, without compromising training efficiency.
Training flexibility - Enables large-scale model training over consumer-grade internet connections, bypassing the need for high-speed interconnects between GPUs.
Compatibility - DisTrO is network- and architecture-agnostic, allowing it to function seamlessly across various neural network setups without additional infrastructure costs.
Getting started - DisTrO’s code isn’t available yet, Nous Research plans to release it soon, so keep an eye out for when you can start testing it in your own workflows.
Quick Bites
GitHub has made Copilot Autofix for CodeQL, their code analysis tool, free for all public repositories. This Copilot provides fixes for vulnerabilities found by CodeQL, both on pull requests and for historical alerts that already exist in a codebase. You can review and choose whether to accept these suggestions or not.
Former Apple designer Jony Ive is collaborating with OpenAI’s CEO Sam Altman on a new AI hardware project. While details are scarce, the startup is reportedly fundraising up to $1 billion and is exploring how generative AI can power a new computing device.
Apple has released public betas for iOS 18.1, iPadOS 18.1, and macOS Sequoia 15.1, featuring new Apple Intelligence tools like text rewriting, a redesigned Siri, and photo object removal. These betas are available to users with iPhone 15 Pro, iPhone 16, M1 iPads, and newer Macs via the beta software program.
Tools of the Trade
RAGApp: Build multi-agent AI applications without a single line of code. You can create and customize multiple AI agents with specific roles, prompts, and tools, then deploy them in a chat interface with streaming responses and source attribution.
Gradio AI Agent: An AI Agent Assistant that can create, deploy, and optimize entire Gradio applications in Python with a simple text prompt. It uses Claude Sonnet 3.5 to build and optimize Gradio apps.
Founder Mode Analyzer: This fun AI tool analyzes your X (Twitter) profile in seconds and shows if you're in Founder Mode or Manager Mode. It assesses your tweets and interactions and helps identify your current focus and approach to business.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.
Hot Takes
I really hate the argument about whether LLMs can reason or not. Can anyone mathematically differentiate between inference and reasoning? :) People treat reasoning like it’s something magical, but I bet many who argue about this issue can’t define it, relying more on gut feelings than facts. ~
Chanwoo ParkAn underutilized perspective on AI for non-technical people is that we now have the world's most advanced compression system for knowledge
Anyone can download, for free, a 235GB file that can answer questions based on a vast swath of all human writing (even if makes some errors) ~
Ethan Mollick
Meme of the Day
Shit
That’s all for today! See you tomorrow with more such AI-filled content.
🎁 Bonus worth $50 💵
Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply