Compresso API

Introduction

The Compresso API provides advanced prompt compression capabilities designed to reduce token count while preserving essential meaning and context. Whether you´re building with OpenAI, Anthropic, Mistral, or any other LLM provider, Compresso helps you cut down on prompt length, reduce latency, and save on token-related costs—without compromising quality.

This API is ideal for developers integrating LLMs into production environments, offering fine-grained control over compression behavior, support for multi-part prompts, and analytics to help you monitor performance.

Authentication

All API requests require an API key to be included in a special X-API-Key header. You can obtain your API key from the Compresso console under the API Keys section. Keep your API key secure and do not share it publicly. If your API key is compromised, regenerate it immediately through the console.

Rate Limiting

The API implements a sliding window rate limit of 300 requests per 5 minutes by default. Custom rate limits can be requested. Rate limit headers are included in all responses:

X-RateLimit-Limit: Total requests allowed per window
X-RateLimit-Remaining: Remaining requests in current window
X-RateLimit-Reset: Timestamp when the rate limit window resets

Getting Started

Your first request

Getting started is very simple. Obtain your API key from the Compresso console or during signup. Then make your first request as shown below:

import requests

url = "https://api-beta.compresso.dev/v1/prompt-compress"
payload = {
"prompt": "What does the fox say?",
"target_ratio": 0.45,
}
headers = {
"Content-Type": "application/json",
"X-API-Key": "__Your-API-Key__"
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
compressed_prompt = result["compressed_prompt"]

Detailed request configuration

We also offer a more finegrained way to seperate your request into context(s), question and instruction, each with their own compression agressiveness:

import requests

url = "https://api-beta.compresso.dev/v1/prompt-compress"
payload = {
"context": ["...", "...", "..."],
"question": "What was the population growth of foxes between 1984 and 2000?",
"target_ratio": 0.45,
}
headers = {
"Content-Type": "application/json",
"X-API-Key": "__Your-API-Key__"
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
compressed_prompt = result["compressed_prompt"] # includes the question

See the documentation below for all available options.

Client Libraries

We are currently working on Python and NodeJS client libraries that wrap our API calls. This will allow you to integrate Compresso into your codebase more easily.

There is not ETA on these libraries, but we are aiming to release them latest in June.

Server

Beta API server

Client Libraries

Curl Shell

Prompt Compression

Operations for prompt compression

Prompt Compression Operations

POST/v1/prompt-compress

Compress a prompt

Reduces the size of a prompt while attempting to preserve essential information

Body

application/json

prompt
Type:string

Complete prompt to compress (alternative to using context/instruction/question)
context

Contexts, documents, or demonstrations exhibiting low sensitivity to compression
One of
Hide Child Attributesfor context
Type:string
instruction
Type:string

General instruction within the prompt, highly sensitive to compression
question
Type:string

General question within the prompt, highly sensitive to compression
target_ratio
Type:number Format: float
min:
0.01
max:
1
default:
0.5

Compression ratio (0.0-1.0) - lower means more compression
target_tokens
Type:integer Format: int32
min:
-1
default:
-1

Target token count for the compressed result
compression_settings
Type:object

Responses

application/json
application/json
application/json
application/json

Request Example forPOST/v1/prompt-compress

Selected HTTP client: Shell Curl

curl https://api-beta.compresso.dev/v1/prompt-compress \
  --request POST \
  --header 'Content-Type: application/json' \
  --data '{
  "prompt": "",
  "context": "",
  "instruction": "",
  "question": "",
  "target_ratio": 0.5,
  "target_tokens": -1,
  "compression_settings": {
    "preserve_k_first": 2,
    "preserve_k_last": 2,
    "include_question": true,
    "force_keep_tokens": []
  }
}'

Show Schema

{
  "compressed_prompt": "…",
  "statistics": {
    "original_tokens": 1,
    "compressed_tokens": 1,
    "original_words": 1,
    "compressed_words": 1,
    "token_ratio": 1,
    "word_ratio": 1
  },
  "analytics": {
    "compression_duration": 1,
    "request_id": "…",
    "timestamp": "2025-04-28T20:05:09.812Z"
  }
}

Successful compression