- unwind ai
- Posts
- AI Agents Search & Scrape the Web with One API Call
AI Agents Search & Scrape the Web with One API Call
PLUS: Opensource AI voice inpainting model, OpenAI Codex for ChatGPT Plus users
Today’s top AI Highlights:
Opensource node-based visual flow engine by ByteDance
One API call for AI agents to search the web and scrape all results
Opensource AI voice inpainting model to edit specific parts of audio
OpenAI Codex now available for ChatGPT Plus subscribers
MCP server for AI agents to browse, scrape, and interact with websites
& so much more!
Read time: 3 mins
AI Tutorial
Building intelligence tools that can automatically gather, analyze, and synthesize competitive data is both challenging and incredibly valuable. But it's one of those projects that sounds straightforward until you realize you're juggling multiple APIs, parsing different data formats, and somehow making sense of scattered information across dozens of websites.
In this tutorial, we'll build a multi-agent Product Intelligence System using GPT-4o, Agno framework, and Firecrawl's new /search endpoint. This system deploys three specialized AI agents that work together to provide comprehensive competitive analysis, market sentiment tracking, and launch performance metrics - all through a clean Streamlit interface.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
ByteDance has opensourced FlowGram.ai, a node-based flow-building engine to quickly create visual workflows. The framework offers both fixed layout modes for structured workflows and free-form connection layouts for more creative implementations, complete with drag-and-drop functionality and real-time visual feedback.
What makes FlowGram compelling is AI-powered workflow automation- think automated task suggestions, intelligent node connections, and smart process optimization that adapts to your development patterns. Having already powered 30+ internal ByteDance projects, including Coze and Feishu's automation platforms, this MIT-licensed tool brings enterprise-grade workflow capabilities to the broader developer community.
Key Highlights:
Dual Layout - Choose between fixed layout for structured workflows with predefined node positions, or free layout for completely flexible node placement and custom connections. Both modes support complex composite nodes like conditional branches, loops, and error handling blocks.
AI-Enhanced Workflows - Built-in AI capabilities help automate repetitive tasks, suggest optimal node connections, and adapt workflows based on usage patterns. The system learns from your workflow designs to provide smarter recommendations over time.
Enterprise-Ready Plugin Ecosystem - Modular architecture with extensible canvas engine, node engine, variable management, and component library that supports custom business logic. Get started instantly and choose from pre-built templates.
Production-Tested - Battle-tested across 30+ ByteDance projects with features that rival paid solutions like ReactFlow, including batch operations, motion animations, layout switching, and comprehensive debugging tools. The layered interaction system uses IoC patterns for clean extensibility.
Firecrawl just dropped their most requested feature with the /search endpoint, and it's exactly what developers have been waiting for. Instead of juggling separate search and scraping operations, you can now discover web content and extract it in LLM-ready formats with a single API call.
The endpoint combines web search with automatic content scraping, returning results in markdown, HTML, links, or even screenshots - all customizable by language, location, and time range. They've also built Firesearch, an opensource research app that showcases the endpoint's capabilities and serves as a solid foundation for your own projects.
Key Highlights:
One API Call Does It All - Search the web and scrape all results simultaneously instead of making separate requests. Get search results with full content extraction in your preferred format (markdown, HTML, links, screenshots) without the typical two-step workflow of search-then-scrape that slows down agent development.
Smart Search Customization - Filter results by time range using simple parameters like
tbs="qdr:w"
for the past week, target specific languages and countries withlang="de"
andcountry="de"
, and control exactly how many results you want. This level of control means your agents get precisely the data they need without irrelevant noise.Ready-to-Use Integrations - Available immediately across Zapier, n8n, MCP (perfect for Claude and OpenAI agents), and direct API access. No waiting for third-party support or building custom integrations - you can start using /search in your existing workflows today.
Flexible Output - Choose from multiple content formats, including clean markdown for LLMs, full HTML for detailed analysis, extracted links for relationship mapping, or screenshots for visual verification. Built-in timeout controls and error handling ensure reliable performance in production environments.
We have created a multi-agent product launch intelligence application that uses this new /search endpoint. Check it out to understand how this works for real-world tasks.
Quick Bites
OpenAI has rolled out Codex to ChatGPT Plus users with generous usage limits, though rate limiting may kick in during high-demand periods. The biggest addition is internet connectivity during task execution, allowing Codex to install dependencies, upgrade packages, and run tests requiring external resources, though it's disabled by default. Plus, Pro, and Team users can now enable internet access for specific environments. The update also brings voice dictation for task assignments, the ability to update existing pull requests, and various other fixes.
Ollama has introduced a new thinking feature that lets you toggle whether models use their reasoning process or jump straight to answers. The feature works with DeepSeek R1 and Qwen 3 models, allowing you to either display the model's step-by-step thinking for transparency or disable it for faster responses.
Manus AI can now generate videos, properly structured and sequenced, directly from your prompts. It handles everything from planning scenes to animating your vision, turning ideas into complete videos in minutes, from creating storyboards to marketing assets. Early access is available for Basic, Plus, and Pro members.
If you've used AI speech models and wanted to edit just one word without regenerating the entire audio, you know how frustrating it can be. Addressing this head-on, PlayAI has opensourced Play Diffusion, a voice inpainting model that lets you modify existing speech output easily. This Diffusion model first transcribes your uploaded audio, allows you to change the text, then updates only the modified portions while keeping the original voice and rhythm intact. And as a bonus, it's also 50x faster for regular text-to-speech generation.
Google has released a new Vertex AI Ranking API that addresses a major pain point in search and RAG systems – the fact that up to 70% of retrieved passages typically lack relevant answers. The API acts as a precision filter that reorders search results based on semantic understanding, helping AI agents surface more accurate information. Two new models are available: semantic-ranker-default-004 for maximum accuracy and semantic-ranker-fast-004 for speed-critical applications. Available via multiple options: direct API access, RAG Engine, AlloyDB's SQL functions, and popular frameworks like LangChain.
Tools of the Trade
Bright Data MCP server: Opensource MCP server that provides AI agents with 30+ web interaction capabilities like browsing, scraping, and API access. It dynamically selects the optimal method based on target site structure. Works with MCP-compatible AI assistants like Claude Desktop and Cursor.
Blaxel: Cloud computing platform specifically to build, deploy, and scale AI agents, with both high-level deployment tools and low-level infrastructure services like fast-booting sandboxes and model gateways. It provides agent-optimized services like serverless agent hosting, MCP server, batch processing, sandboxed environment, and more.
Instructor: Python library that extracts structured data from LLMs like GPT, Claude, and Gemini using Pydantic models for type safety, validation, and error handling. It supports 15+ LLM providers, and comes with built-in error handling and streaming support.
OpenMCP: Turns any OpenAPI spec into an MCP server and lets you bundle multiple servers into one with only the tools you want. It supports both OpenAPI and stdio setups, works with all major chat clients, and runs with a simple install command.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
I keep seeing Veo 3 demo videos that are super realistic and could have been actually filmed using people and props and cameras and locations
That's so boring and unimaginative! ~
Simon WillisonAfter the initial sugar rush, expensive models end up getting little or no usage.
GPT 4.5 - was barely used before being retired
o1 - was too slow and expensive to be used IRL
Veo-2 - abandoned, gave people sticker shock
o3 - only used as a fallback or for planning
Veo-3 and Opus will likely meet the same fate.
AI labs should abandon these big models. It's not worth the R&D expenses ~
Bindu Reddy
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply