v1.0.0-beta
OAS 3.0.3

Compresso API

Introduction

The Compresso API provides advanced prompt compression capabilities designed to reduce token count while preserving essential meaning and context. Whether you´re building with OpenAI, Anthropic, Mistral, or any other LLM provider, Compresso helps you cut down on prompt length, reduce latency, and save on token-related costs—without compromising quality.

This API is ideal for developers integrating LLMs into production environments, offering fine-grained control over compression behavior, support for multi-part prompts, and analytics to help you monitor performance.

Authentication

All API requests require an API key to be included in a special X-API-Key header. You can obtain your API key from the Compresso console under the API Keys section. Keep your API key secure and do not share it publicly. If your API key is compromised, regenerate it immediately through the console.

Rate Limiting

The API implements a sliding window rate limit of 300 requests per 5 minutes by default. Custom rate limits can be requested. Rate limit headers are included in all responses:

  • X-RateLimit-Limit: Total requests allowed per window
  • X-RateLimit-Remaining: Remaining requests in current window
  • X-RateLimit-Reset: Timestamp when the rate limit window resets

Getting Started

Your first request

Getting started is very simple. Obtain your API key from the Compresso console or during signup. Then make your first request as shown below:

import requests

url = "https://api-beta.compresso.dev/v1/prompt-compress"
payload = {
"prompt": "What does the fox say?",
"target_ratio": 0.45,
}
headers = {
"Content-Type": "application/json",
"X-API-Key": "__Your-API-Key__"
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
compressed_prompt = result["compressed_prompt"]

Detailed request configuration

We also offer a more finegrained way to seperate your request into context(s), question and instruction, each with their own compression agressiveness:

import requests

url = "https://api-beta.compresso.dev/v1/prompt-compress"
payload = {
"context": ["...", "...", "..."],
"question": "What was the population growth of foxes between 1984 and 2000?",
"target_ratio": 0.45,
}
headers = {
"Content-Type": "application/json",
"X-API-Key": "__Your-API-Key__"
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
compressed_prompt = result["compressed_prompt"] # includes the question

See the documentation below for all available options.

Client Libraries

We are currently working on Python and NodeJS client libraries that wrap our API calls. This will allow you to integrate Compresso into your codebase more easily.

There is not ETA on these libraries, but we are aiming to release them latest in June.

Beta API server

Client Libraries

Prompt Compression

Operations for prompt compression

Prompt Compression Operations

Compress a prompt

Reduces the size of a prompt while attempting to preserve essential information

Body
application/json
  • prompt
    Type:string

    Complete prompt to compress (alternative to using context/instruction/question)

  • context

    Contexts, documents, or demonstrations exhibiting low sensitivity to compression

    One of
    • Type:string
  • instruction
    Type:string

    General instruction within the prompt, highly sensitive to compression

  • question
    Type:string

    General question within the prompt, highly sensitive to compression

  • target_ratio
    Type:number Format: float
    min: 
    0.01
    max: 
    1
    default: 
    0.5

    Compression ratio (0.0-1.0) - lower means more compression

  • target_tokens
    Type:integer Format: int32
    min: 
    -1
    default: 
    -1

    Target token count for the compressed result

  • compression_settings
    Type:object
Responses
  • application/json
  • application/json
  • application/json
  • application/json
Request Example forPOST/v1/prompt-compress
curl https://api-beta.compresso.dev/v1/prompt-compress \
  --request POST \
  --header 'Content-Type: application/json' \
  --data '{
  "prompt": "",
  "context": "",
  "instruction": "",
  "question": "",
  "target_ratio": 0.5,
  "target_tokens": -1,
  "compression_settings": {
    "preserve_k_first": 2,
    "preserve_k_last": 2,
    "include_question": true,
    "force_keep_tokens": []
  }
}'
{
  "compressed_prompt": "…",
  "statistics": {
    "original_tokens": 1,
    "compressed_tokens": 1,
    "original_words": 1,
    "compressed_words": 1,
    "token_ratio": 1,
    "word_ratio": 1
  },
  "analytics": {
    "compression_duration": 1,
    "request_id": "…",
    "timestamp": "2025-04-28T20:05:09.812Z"
  }
}