Realtime Inference | Doubleword Inference API

The realtime API is perfect for development to quickly iterate on prompts, validate model behavior, and prototype your pipeline. For production workloads that don't need instant responses, consider Async Inference or Batch Inference for significant cost savings.

Quick Start

Using the Playground

The fastest way to test the real-time API is through our interactive playground. Simply select a model, enter your prompt, and get instant responses.

Chat Completions

from openai import OpenAI

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

response = client.chat.completions.create(
    model="{{selectedModel.id}}",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is batch inference?"}
    ]
)

print(response.choices[0].message.content)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.doubleword.ai/v1',
  apiKey: '{{apiKey}}'
});

const response = await client.chat.completions.create({
  model: '{{selectedModel.id}}',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is batch inference?' }
  ]
});

console.log(response.choices[0].message.content);

curl https://api.doubleword.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {{apiKey}}" \
  -d '{
    "model": "{{selectedModel.id}}",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is batch inference?"}
    ]
  }'

Open Responses API

The Open Responses API provides a unified interface with built-in support for background processing. Use service_tier: "priority" for realtime inference:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

# Blocking — waits for the response
resp = client.responses.create(
    model="{{selectedModel.id}}",
    input="Explain quantum computing in one paragraph.",
    service_tier="priority",
)

print(resp.output_text)

Background Mode

For longer-running requests, use background=True to return immediately and poll for the result:

from openai import OpenAI
from time import sleep

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

resp = client.responses.create(
    model="{{selectedModel.id}}",
    input="Write a detailed essay about the history of space exploration.",
    service_tier="priority",
    background=True,
)

# Poll until complete
while resp.status in ("queued", "in_progress"):
    print(f"Status: {resp.status}")
    sleep(2)
    resp = client.responses.retrieve(resp.id)

print(f"Done! Output:
{resp.output_text}")

Next Steps

Get started with Async Inference — lower cost with service_tier: "flex"
Learn more about Batch Inference — lowest cost for bulk workloads
View available models