Set Up Model Pricing
This guide shows you how to configure per-token pricing for your models using tariffs.
Prerequisites
- A deployed model (see Add Endpoints)
- Platform Manager role
Understanding Tariffs
A tariff defines per-token pricing for a model. Each model can have multiple tariffs for different purposes:
| Purpose | Description | Limit |
|---|---|---|
| Realtime | Standard API requests | One per model |
| Batch | Asynchronous batch processing | One per SLA (e.g., 24h, 1h) |
| Playground | Dashboard testing | One per model |
Models without tariffs are free to use.
Set Pricing via the Dashboard
- Go to Models in the sidebar
- Click on the model you want to price
- Click Manage Pricing Tariffs
- Click + Add Tariff
- Fill in the pricing details:
- Name: Descriptive label (e.g., "Standard Pricing")
- Purpose: Select realtime, batch, or playground
- Input price: Cost per 1M input tokens
- Output price: Cost per 1M output tokens
- For batch tariffs, select the SLA (completion window like "24h")
- Click Save Changes
Pricing takes effect immediately.
Set Pricing via the API
Include a tariffs array when creating or updating a model:
curl -X PATCH "https://your-instance/admin/api/v1/models/{model-id}" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"tariffs": [
{
"name": "Realtime Pricing",
"input_price_per_token": "0.003",
"output_price_per_token": "0.015",
"api_key_purpose": "realtime"
}
]
}'Tariff Fields
| Field | Required | Description |
|---|---|---|
name | Yes | Descriptive name for the tariff |
input_price_per_token | Yes | Price per input token (decimal) |
output_price_per_token | Yes | Price per output token (decimal) |
api_key_purpose | No | "realtime", "batch", or "playground" |
completion_window | Batch only | SLA like "24h" or "1h" |
Prices are stored per-token with 8 decimal places. The dashboard displays prices per 1M tokens for readability. To convert: \$3.00 per 1M tokens = 0.000003 per token.
Example: Pricing with Multiple Tiers
Configure different prices for realtime, batch, and playground:
{
"tariffs": [
{
"name": "Realtime",
"input_price_per_token": "0.00003",
"output_price_per_token": "0.00006",
"api_key_purpose": "realtime"
},
{
"name": "Batch 24h",
"input_price_per_token": "0.000015",
"output_price_per_token": "0.00003",
"api_key_purpose": "batch",
"completion_window": "24h"
},
{
"name": "Batch 1h (Express)",
"input_price_per_token": "0.000025",
"output_price_per_token": "0.00005",
"api_key_purpose": "batch",
"completion_window": "1h"
},
{
"name": "Playground (Free)",
"input_price_per_token": "0",
"output_price_per_token": "0",
"api_key_purpose": "playground"
}
]
}This configuration:
- Charges full price for realtime API usage
- Offers 50% discount for 24-hour batch jobs
- Charges near-realtime rates for express 1-hour batch jobs
- Makes playground testing free
Updating Prices
When you update tariffs, the system:
- Closes old tariffs by setting their end date to now
- Creates new tariffs effective immediately
- Preserves historical pricing for accurate transaction records
Old transactions are charged at the rate that was active when they occurred. New transactions use the updated rates.
Viewing Current Tariffs
To see a model's current pricing via API, include pricing in the query:
curl "https://your-instance/admin/api/v1/models/{model-id}?include=pricing" \
-H "Authorization: Bearer $API_KEY"The response includes a tariffs array with all active tariffs.
Removing Pricing
To make a model free, update it with an empty tariffs array:
curl -X PATCH "https://your-instance/admin/api/v1/models/{model-id}" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"tariffs": []}'This closes all active tariffs. Requests to the model will no longer incur charges.
Related Topics
- How Billing Works — Understand the credits system
- Configuration Reference — Batch SLA configuration