Unified Client
Send prompts to all major LLM providers (OpenAI, Anthropic, Google, and more) with a single client interface.
Unified Client
Send prompts to all major LLM providers (OpenAI, Anthropic, Google, and more) with a single client interface.
Massive Concurrency
Set your rate limits, then let it fly. Automatic throttling and retry logic.
Spray Across Models
Configure multiple models from any provider(s) with sampling weights. The client samples a model for each request.
Tool Use & MCP
Unified API for tools across all providers. Instantiate tools from MCP servers with built-in call interfaces.
Convenient Messages
No more Googling how to build messages lists. Conversation and Message classes work seamlessly for all providers.
Caching
Save completions in local or distributed cache to avoid repeated LLM calls on the same input.
First, install with pip:
pip install lm-delugeThen, make your first API request.
from lm_deluge import LLMClient
# you'll need OPENAI_API_KEY set in the environment!client = LLMClient("gpt-4o-mini")resps = client.process_prompts_sync(["Hello, world!"])print(resps[0].completion)LM Deluge is designed for batch inference at scale. If you need to process thousands of prompts as fast as possible while respecting rate limits, this is for you. The library handles all the complexity of rate limiting, retries, and provider-specific quirks so you can focus on your application logic.