LlamaIndex
The llamaindex-doubleword package provides Doubleword LLM and embedding models for LlamaIndex, with both real-time and batch variants.
Install
pip install llamaindex-doublewordChat / Completions
from llamaindex_doubleword import DoublewordLLM
llm = DoublewordLLM(
model="{{selectedModel.id}}",
api_key="{{apiKey}}",
)
response = llm.complete("Say hello.")
print(response.text)Embeddings
from llamaindex_doubleword import DoublewordEmbedding
embed_model = DoublewordEmbedding(
model_name="Qwen/Qwen3-Embedding-8B",
api_key="{{apiKey}}",
)
embedding = embed_model.get_text_embedding("Hello world")Batch pricing with Autobatcher
For background tasks where latency is not critical, use the batch variants to transparently route requests through the Batch API at reduced cost:
pip install llamaindex-doubleword autobatcherfrom llamaindex_doubleword import DoublewordLLMBatch
import asyncio
llm = DoublewordLLMBatch(
model="{{selectedModel.id}}",
api_key="{{apiKey}}",
)
async def main():
response = await llm.acomplete("Say hello.")
print(response.text)
asyncio.run(main())The batch variants (DoublewordLLMBatch, DoublewordEmbeddingBatch) are async-only. They collect concurrent requests and submit them as batch jobs automatically, cutting inference costs by up to 90%.
Using with LlamaIndex
from llama_index.core import Settings, VectorStoreIndex
Settings.llm = DoublewordLLM(model="{{selectedModel.id}}", api_key="{{apiKey}}")
Settings.embed_model = DoublewordEmbedding(model_name="Qwen/Qwen3-Embedding-8B", api_key="{{apiKey}}")
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is this about?")