DoublewordDoubleword

Async Inference

Async inference lets you make LLM requests at reduced cost by deferring processing from real-time. Your requests are queued and processed within a 1-hour completion window, with results available via polling.

Why Async Inference?

  • OpenAI-compatible — Uses the standard openai SDK with the Responses API
  • Lower cost — Async requests are priced below realtime, above batch
  • No JSONL files — Unlike batch inference, you make standard API calls
  • Background or blocking — Return immediately with a response ID, or hold the connection until complete

When to use Async Inference

Async inference is the right choice when your application makes LLM calls that don't need to resolve instantly. Common use cases include:

  • Agentic workflows — Multi-step agent systems where individual steps can be processed asynchronously
  • Background processing — Content generation, summarization, or classification running behind a queue
  • Development and testing — Running evaluations or prompt iterations where you don't need instant feedback
  • Cost optimization — Any workload where a 1-hour completion window is acceptable

Quick Start

1. Create an API Key

Generate a key from the Doubleword Console, or sign in above to auto-populate the code examples.

2. Submit a request with service_tier: "flex"

from openai import OpenAI
from time import sleep

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

# Submit an async request — returns immediately with status "queued"
resp = client.responses.create(
    model="{{selectedModel.id}}",
    input="Explain the theory of relativity in detail.",
    service_tier="flex",
    background=True,
)

print(f"Queued: {resp.id} (status: {resp.status})")

# Poll until the daemon completes it
while resp.status in ("queued", "in_progress"):
    sleep(2)
    resp = client.responses.retrieve(resp.id)
    print(f"Status: {resp.status}")

print(f"
Output:
{resp.output_text}")
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.doubleword.ai/v1',
  apiKey: '{{apiKey}}'
});

// Submit an async request
const resp = await client.responses.create({
  model: '{{selectedModel.id}}',
  input: 'Explain the theory of relativity in detail.',
  service_tier: 'flex',
  background: true,
});

console.log(`Queued: ${resp.id} (status: ${resp.status})`);

// Poll until complete
let result = resp;
while (['queued', 'in_progress'].includes(result.status)) {
  await new Promise(r => setTimeout(r, 2000));
  result = await client.responses.retrieve(result.id);
  console.log(`Status: ${result.status}`);
}

console.log(`
Output:
${result.output_text}`);

Blocking mode

If you prefer to hold the connection until the result is ready, omit background:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

# Blocks until the async request completes (up to 1 hour)
resp = client.responses.create(
    model="{{selectedModel.id}}",
    input="Summarize the history of artificial intelligence.",
    service_tier="flex",
)

print(resp.output_text)

How It Works

  1. You submit a request with service_tier: "flex" via the Responses API
  2. Doubleword creates a batch of 1 with a 1-hour completion window
  3. The request is queued and processed by the inference daemon
  4. Results are available via GET /v1/responses/{id} or by polling
  5. Your code receives a standard Open Responses API response object

Using Autobatcher

For existing Chat Completions code, the Autobatcher can automatically convert your realtime calls into async batches — no code changes required beyond configuration.

from autobatcher import AsyncOpenAI

client = AsyncOpenAI(
    api_key="{{apiKey}}",
    base_url="https://api.doubleword.ai/v1"
)

# Looks like a normal OpenAI call, but runs asynchronously
response = await client.chat.completions.create(
    model="{{selectedModel.id}}",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
)

print(response.choices[0].message.content)

Next Steps