DoublewordDoubleword

LlamaIndex

The llamaindex-doubleword package provides Doubleword LLM and embedding models for LlamaIndex, with both real-time and batch variants.

Install

pip install llamaindex-doubleword

Chat / Completions

from llamaindex_doubleword import DoublewordLLM

llm = DoublewordLLM(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)

response = llm.complete("Say hello.")
print(response.text)

Embeddings

from llamaindex_doubleword import DoublewordEmbedding

embed_model = DoublewordEmbedding(
    model_name="Qwen/Qwen3-Embedding-8B",
    api_key="{{apiKey}}",
)

embedding = embed_model.get_text_embedding("Hello world")

Batch pricing with Autobatcher

For background tasks where latency is not critical, use the batch variants to transparently route requests through the Batch API at reduced cost:

pip install llamaindex-doubleword autobatcher
from llamaindex_doubleword import DoublewordLLMBatch
import asyncio

llm = DoublewordLLMBatch(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)

async def main():
    response = await llm.acomplete("Say hello.")
    print(response.text)

asyncio.run(main())

The batch variants (DoublewordLLMBatch, DoublewordEmbeddingBatch) are async-only. They collect concurrent requests and submit them as batch jobs automatically, cutting inference costs by up to 90%.

Using with LlamaIndex

from llama_index.core import Settings, VectorStoreIndex

Settings.llm = DoublewordLLM(model="{{selectedModel.id}}", api_key="{{apiKey}}")
Settings.embed_model = DoublewordEmbedding(model_name="Qwen/Qwen3-Embedding-8B", api_key="{{apiKey}}")

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is this about?")