- unwind ai
- Posts
- Hugging Face MCP Server for Models, Datasets and Papers
Hugging Face MCP Server for Models, Datasets and Papers
PLUS: Free opensource alternative to Gumloop, Cursor's remote AI coding agents
Today’s top AI Highlights:
Stanford’s new LLM engine just smoked vLLM by 3x
Free opensource no-code AI agent builder
HF MCP server for 1M+ models, datasets, papers, and Spaces
Cursor now reviews code, remembers context, and runs parallel background remote agents
Generate MCP servers from any OpenAPI spec in seconds
& so much more!
Read time: 3 mins
AI Tutorial
Traditional RAG has served us well, but it's becoming outdated for complex use cases. While vanilla RAG can retrieve and generate responses, agentic RAG adds a layer of intelligence and adaptability that transforms how we build AI applications. Also, most RAG implementations are still black boxes - you ask a question, get an answer, but have no idea how the system arrived at that conclusion.
In this tutorial, we'll build a multi-agent RAG system with transparent reasoning using Claude 4 Sonnet and OpenAI. You'll create a system where you can literally watch the AI agent think through problems, search for information, analyze results, and formulate answers - all in real-time.
We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
This guy vibe-coded a $50M no-code platform on a weekend and opensourced it for free.
Most no-code agent builders are just workflow automation with LLM calls sprinkled in, but Agent Flow changes that. Built using Composio and LangGraph, Agent Flow lets you create true AI agents with dynamic routing, complex tool orchestration, and support for any model through a visual drag-and-drop interface. It is designed from the ground up for agents that need to make decisions, route dynamically, and handle sophisticated tool interactions.
Key Highlights:
Four Core Node Design - Everything builds from just Input, LLM, Tool, and Output nodes, where an "agent" is simply LLM + Tool nodes with feedback loops, making complex workflows intuitive to construct.
Built-in Agent Patterns - Implements proven patterns from Anthropic's "Building Effective Agents" guide including prompt chaining, parallelization, routing, and evaluator-optimizer loops right out of the box.
Universal Model Support - Works with any LLM provider (OpenAI, Anthropic, local endpoints) through LangGraph's native graph execution model, giving developers complete flexibility in model choice.
Seamless Tool Integration - Powered by Composio's 100+ pre-built tools with automatic authentication handling, eliminating the nightmare of managing different OAuth flows and API keys across multiple services.
When you need to generate 10,000 solutions to math problems or process every file in a massive codebase, this LLM engine is what you need.
Stanford just dropped an opensource LLM inference engine, Tokasaurus, built specifically for scenarios where you need maximum throughput rather than low latency. Think processing entire codebases, sampling 1000s of model outputs for experiments, or generating massive synthetic training data - workloads where completing your entire batch quickly matters more than getting individual responses fast. Tokasaurus delivers up to 3x performance improvements over existing engines like vLLM and SGLang by focusing specifically on these batch-heavy use cases.
Key Highlights:
Prevents performance bottlenecks - Monitors your system in real-time and automatically adjusts to prevent the CPU from becoming a bottleneck that slows down your GPU processing.
Exploits repetitive patterns - When your batch contains similar prompts or shared content, Tokasaurus identifies these patterns and processes them more efficiently, saving significant computation time.
Works with any GPU setup - Whether you have basic GPUs or high-end hardware with fast connections, the engine automatically picks the best approach to maximize your throughput.
Easy to customize - Built in pure Python so you can easily modify it for your specific needs, with simple installation and support for popular models like Llama-3 and Qwen-2.
Imagine your MCP agent browsing through 1 million+ models and datasets like it's scrolling through a social feed.
Hugging Face released their first official opensource MCP server, giving you direct programmatic access to the entire Hub ecosystem from any MCP client. Connect seamlessly through VSCode, Cursor, Claude Desktop, or any other MCP-compatible tool to search models, datasets, and papers, plus tap into 1000s of Gradio apps hosted on Spaces. The server comes with both built-in tools and dynamic access to MCP-compatible applications.
Key Highlights:
Built-in Hub Tools - Search models and datasets with advanced filters, find research papers through semantic search, and explore Spaces applications using natural language queries.
Gradio Integration - Automatically exposes all MCP-compatible Gradio apps hosted on Spaces, expanding your available tools without manual configuration.
Transport Options - Supports STDIO, SSE, StreamableHTTP, and StreamableHTTPJSON protocols, with Docker deployment and a web dashboard for easy management.
Quick Bites
Google released an updated preview of Gemini 2.5 Pro with significant gains in programming, math, reasoning, and creative writing.
The model jumped up by 24 points in Elo score to maintain its lead on LMArena and climbed 35 points to top the WebDevArena. This preview version will become the stable, generally available release in a couple of weeks.
Alibaba's Qwen team has released the Qwen3-Embedding and Qwen3-Reranker models series, specifically designed for text embedding, retrieval, and reranking tasks. Built on the Qwen 3 foundation, the models leverage Qwen 3’s robust multilingual text understanding capabilities and achieve SOTA performance. All models are opensourced under Apache 2.0 license and available through Hugging Face, GitHub, and Alibaba API.
Cursor just launched version 1.0 with several major updates for developers.
Background Agent is now available to everyone. These remote coding agents can handle multiple simultaneous coding tasks in remote environments.
Memories feature allows Cursor to retain project-specific context and facts across conversations.
BugBot automatically scans PRs for potential bugs and issues, leaving GitHub comments with a "Fix in Cursor" button that pre-fills prompts in the editor for quick resolution.
Install MCP servers in one click. Developers can add "Add to Cursor" buttons to their documentation for easier distribution.
Projects in Claude now support 10x more content. When you add files beyond the existing threshold, Claude switches to a new retrieval mode to expand the functional context. It’s rolling out to all Claude paid users.
Tools of the Trade
app.build - Opensource agent that can build and deploy full-stack applications, with end-to-end tests and automated deployments. Each task given is decomposed into multiple tasks that can be solved independently. The generated apps will have their own repository and be deployed to the internet with a real backend and a real database.
Taskade OpenAPI Codegen: Automatically generates MCP servers from OpenAPI specs compatible with MCP clients like Claude and Cursor. It parses your OpenAPI 3.x spec and outputs ready-to-use MCP tools with support for custom headers, fetch overrides, and response normalization.
Figma Dev Mode MCP Server: Integrates Figma design context directly into AI tools like Cursor and Claude Code. It provides LLMs with design metadata, component mappings, variable definitions, and visual context to generate code that matches both your codebase patterns and design intent.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, MCP, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
I feel there are two parallel worlds of AI use developing with a growing divide between.
Engineers use APIs while everyone else uses a chatbot. I think coders don’t take what experts can do with chatbots seriously & non-tech people don’t understand building scalable AI tools. ~
Ethan MollickStarting a chat with 4o and escalating to o3 has “may I speak with your manager” energy ~
Nathan Baschez
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply