lm-deluge

Universal LLM client library designed to help you max out your rate limits with massive concurrency and intelligent throttling.

Get Started View on GitHub

Key Features

Unified Client

Send prompts to all major LLM providers (OpenAI, Anthropic, Google, and more) with a single client interface.

Massive Concurrency

Set your rate limits, then let it fly. Automatic throttling and retry logic.

Spray Across Models

Configure multiple models from any provider(s) with sampling weights. The client samples a model for each request.

Tool Use & MCP

Unified API for tools across all providers. Instantiate tools from MCP servers with built-in call interfaces.

Convenient Messages

No more Googling how to build messages lists. Conversation and Message classes work seamlessly for all providers.

Caching

Save completions in local or distributed cache to avoid repeated LLM calls on the same input.

Quick Example

First, install with pip:

pip install lm-deluge

Then, make your first API request.

from lm_deluge import LLMClient

# you'll need OPENAI_API_KEY set in the environment!
client = LLMClient("gpt-4.1-mini")
resps = client.process_prompts_sync(["Hello, world!"])
print(resps[0].completion)

Why LM Deluge?

LM Deluge is designed for batch inference at scale. If you need to process thousands of prompts as fast as possible while respecting rate limits, this is for you. The library handles all the complexity of rate limiting, retries, and provider-specific quirks so you can focus on your application logic.