unwind ai
Posts
Build, Evaluate, and Monitor AI Agents

Build, Evaluate, and Monitor AI Agents

PLUS: Auto-optimize LLM Prompts, ChatGPT web search

Shubham Saboo & Gargi Gupta
November 04, 2024

Today’s top AI Highlights:

The library to build & auto-optimize LLM applications
AI Agent debugging becomes as simple as adding two lines
ChatGPT now lets you do web search just like Google
Perplexity will help you understand key issues, vote intelligently, and track election results real-time
OpenAI Swarm with durable execution to build reliable multi-agent apps

& so much more!

Read time: 3 mins

AI Tutorials

AI tools are redefining creative fields, and movie production is no exception. Imagine a tool that brings your movie ideas to life by generating script outlines, casting suggestions, and complete concept summaries. That’s what we’ll build today using Claude 3.5 Sonnet, Phidata, and SerpAPI.

This tutorial will guide you through creating an AI-powered movie agent that

Generates script outlines based on your movie idea, genre, and target audience
Suggests suitable actors for main roles, considering their past performances and current availability
Provides a concise movie concept overview

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI Movie Production Agent with Claude 3.5 Sonnet

Fully-functional LLM app in just 30 lines of Python Code (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Your LLM Pipeline can now Optimize its own Prompts 🌟

AdalFlow lets you build and automatically optimize applications powered by LLMs. It uses a design similar to PyTorch, making it easy to create custom pipelines for tasks like question answering, text classification, and more. AdalFlow includes built-in tools to automatically improve the accuracy of your LLM prompts, often using fewer tokens than other methods. This means faster development, lower costs, and better results. It's designed to work seamlessly for both researchers testing new ideas and engineering teams deploying LLM applications.

Key Highlights:

Automatic Prompt Optimization - AdalFlow automatically tunes your LLM prompts for better accuracy, supporting both zero-shot and few-shot learning. It builds on existing techniques like Text-Grad while incorporating improvements such as Text-Grad 2.0 and Learn-to-Reason Few-shot In Context Learning.
Easy Pipeline Creation - Build custom LLM applications quickly using AdalFlow's modular components (Component and DataClass). You control every aspect: prompts, models, and how the LLM's output is processed.
Token Efficiency & High Accuracy - Get high-accuracy results while saving money and time. AdalFlow's optimization features help you minimize token usage, directly impacting costs and performance.
Production Integration - Complete source code visibility and PyTorch-like architecture make it straightforward to adopt in production environments. The library supports iterative testing on production data and allows you to extend methods based on your specific needs.

See AI Agent Frank In Action!

Lower costs by letting Agent Frank, Salesforge’s AI SDR, take care of prospecting, crafting messages and booking meetings while your team can focus on closing deals! Now, you can get a personalized email from Agent Frank to test out his capabilities in real-time:

Show me the magic

Python SDK to Build, Evaluate, and Monitor AI Agents 👀

AgentOps is a Python SDK that monitors AI agents across multiple frameworks including CrewAI, Langchain, and Autogen, providing you with comprehensive performance tracking and debugging capabilities. The platform automatically captures LLM prompts, completions, timestamps, and execution flows while requiring only two lines of code integration.

Session replays and waterfall visualizations help developers trace exact agent behaviors, API calls, and failure points, making it easier to understand what happened during agent executions.

Key Highlights:

Quick Implementation - Two-line implementation that auto-instruments popular LLM providers and frameworks - just import agentops and call agentops.init() with your API key to start capturing all agent activity, including prompts, completions, tool usage, and errors.
Visual Debugging - Built-in session waterfall visualization shows the exact sequence and timing of events, making it easy to identify bottlenecks, trace errors to their source, and understand agent decision flows across multiple LLM calls and tool uses.
Multi-Agent Support - Support for concurrent multi-agent debugging with session inheritance and agent tracking decorators (@track_agent) that let you monitor specific agents across processes while maintaining session context.
Cost Control - Detailed cost analysis and usage metrics for each session, including per-model token counts, API call frequency, and execution times, helping you optimize agent performance and control spending before production deployment.
Getting Started - Install with pip install agentops and set up with your API key from app.agentops.ai. The platform provides first-class support for CrewAI, Langchain, and Autogen frameworks with built-in decorators and callback handlers for each.

Quick Bites

ChatGPT now has a new web search feature that acts as a proper search engine. It gives up-to-date fast answers with links to relevant web sources so you can get information directly within the chat. Currently available for ChatGPT Plus and Team users, this feature will roll out to Free users in the coming months.

Perplexity has launched an Election Information Hub, providing live updates and answers to election-related questions using data from The Associated Press and Democracy Works. This hub aims to help users track U.S. general election results and understand key voting issues directly on Perplexity’s platform. You can try it here.

Scale AI has launched Expert Match, a new platform where AI developers can team up with experts like doctors, lawyers, and PhDs on their AI projects. The platform makes it simple to find and connect with the right experts through features like detailed profile searches and qualification checks, helping both sides work together to advance AI technology.

Anthropic has upgraded Claude’s PDF support. This lets Claude see the actual layout and visuals of PDFs, rather than just extracting text, which improves its understanding of complex documents like charts and diagrams. This update is now live on claude.ai and available via the Anthropic API.

Tools of the Trade

DurableSwarm: Adds durable execution to OpenAI's Swarm, allowing multi-agent workflows to resume automatically after interruptions by storing progress in a Postgres database. This makes it reliable for long-running, interactive tasks.
Zerox OCR: A dead simple way of OCR-ing a document for AI ingestion. It converts documents into Markdown by processing each page as an image with vision models like GPT-4o-mini. It supports multiple file types and is available as Node and Python packages.
PocketPal AI: Mobile AI assistant that runs small LMs directly on your phone 100% offline and free, for both iOS and Android. It supports multiple models, offers customization settings, and automatically manages memory to optimize performance.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

Imagine being Sundar Pichai now:
- you had the largest continually updated data set of any company to train AI on (the Google Index)
- you invented the underlying technology of LLMs like ChatGPT in 2017 called Transfomers
- you had complete search dominance: all you had to add was AI and you'd own the market
And yet:
- you managed to complete fumble your massive head start and was late to everything
- you made your APIs so hard to use nobody seriously integrated it into their apps and people instead went Anthropic and OpenAI
- you now see your search dominance quickly slipping away to Perplexity and yesterday's launched ChatGPT Search
This will be a business case studied in universities for decades ~
levelsio
can't think of anything more useless today than learning regex ~
Santiago

Meme of the Day

Founder bet his Seed round on US election
— Jason (@mytechceoo)
9:46 PM • Oct 29, 2024

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.