Compresso API
Introduction
The Compresso API provides advanced prompt compression capabilities designed to reduce token count while preserving essential meaning and context. Whether you´re building with OpenAI, Anthropic, Mistral, or any other LLM provider, Compresso helps you cut down on prompt length, reduce latency, and save on token-related costs—without compromising quality.
This API is ideal for developers integrating LLMs into production environments, offering fine-grained control over compression behavior, support for multi-part prompts, and analytics to help you monitor performance.
Authentication
All API requests require an API key to be included in a special X-API-Key header.
You can obtain your API key from the Compresso console under the API Keys section. Keep your API key secure and do not share it publicly. If your API key is compromised, regenerate it immediately through the console.
Rate Limiting
The API implements a sliding window rate limit of 300 requests per 5 minutes by default. Custom rate limits can be requested. Rate limit headers are included in all responses:
X-RateLimit-Limit: Total requests allowed per windowX-RateLimit-Remaining: Remaining requests in current windowX-RateLimit-Reset: Timestamp when the rate limit window resets
Getting Started
Your first request
Getting started is very simple. Obtain your API key from the Compresso console or during signup. Then make your first request as shown below:
import requests
url = "https://api-beta.compresso.dev/v1/prompt-compress"
payload = {
"prompt": "What does the fox say?",
"target_ratio": 0.45,
}
headers = {
"Content-Type": "application/json",
"X-API-Key": "__Your-API-Key__"
}
response = requests.post(url, json=payload, headers=headers)
result = response.json()
compressed_prompt = result["compressed_prompt"]
Detailed request configuration
We also offer a more finegrained way to seperate your request into context(s), question and instruction, each with their own compression agressiveness:
import requests
url = "https://api-beta.compresso.dev/v1/prompt-compress"
payload = {
"context": ["...", "...", "..."],
"question": "What was the population growth of foxes between 1984 and 2000?",
"target_ratio": 0.45,
}
headers = {
"Content-Type": "application/json",
"X-API-Key": "__Your-API-Key__"
}
response = requests.post(url, json=payload, headers=headers)
result = response.json()
compressed_prompt = result["compressed_prompt"] # includes the question
See the documentation below for all available options.
Client Libraries
We are currently working on Python and NodeJS client libraries that wrap our API calls. This will allow you to integrate Compresso into your codebase more easily.