LangChain / LangGraph
The langchain-doubleword package lets you use Doubleword as the model backend for any LangChain or LangGraph application. You get four drop-in classes — real-time and batched variants of chat and embeddings — that slot into agents, chains, graphs, retrievers, or anywhere else a LangChain BaseChatModel or Embeddings is accepted.
The batched variants are the headline feature. They let a LangGraph workflow fan out dozens or hundreds of parallel LLM calls and have them transparently collected into a single Doubleword batch submission — half the per-token cost, no changes to your graph code.
Install
pip install langchain-doublewordAuthenticate
Three options, pick whichever fits. Credentials resolve in this order:
- Pass the key directly —
ChatDoubleword(model=..., api_key="sk-..."). Use for quick scripts and tests. DOUBLEWORD_API_KEYenvironment variable — the right choice for production, CI, and containers.- The
dwCLI — rundw loginonce and every script on the machine picks up your active account's inference key from~/.dw/. The smoothest option for local development.
No separate config is needed — whichever class you instantiate finds its credentials automatically.
Real-time chat
Interactive calls go through Doubleword's OpenAI-compatible chat endpoint. This is what you want for a chatbot, an interactive agent, or anything latency-sensitive.
from langchain_doubleword import ChatDoubleword
llm = ChatDoubleword(model="Qwen/Qwen3-14B-FP8")
print(llm.invoke("Explain bismuth in three sentences.").content)It's a standard BaseChatModel, so bind_tools, with_structured_output, streaming, and LangSmith tracing all work unchanged.
Batched chat
For bulk work or agents that fan out, swap ChatDoubleword for ChatDoublewordBatch. The interface is identical — the only user-visible difference is it's async-only.
import asyncio
from langchain_doubleword import ChatDoublewordBatch
llm = ChatDoublewordBatch(model="Qwen/Qwen3-14B-FP8")
async def main():
results = await asyncio.gather(*[
llm.ainvoke(f"Summarise chapter {i}") for i in range(50)
])
for r in results:
print(r.content)
asyncio.run(main())Those 50 concurrent ainvoke calls get collected into one batch submission. The pricing follows Doubleword's batch tier (roughly half of real-time). The cost you pay for that: a small window-of-wait tax on the first call, tunable via batch_window_seconds.
Pick ChatDoublewordBatch when:
- Your LangGraph workflow runs parallel branches or
Sendfan-out and you want batch pricing without rewriting the graph. - The model you need is only exposed via Doubleword's batch API (some of the larger Doubleword-hosted models fall into this category).
- You're embedding or processing a large corpus offline.
For embeddings, the same pattern applies: DoublewordEmbeddings is real-time, DoublewordEmbeddingsBatch is transparently batched.
Inside a LangGraph node
Both chat classes slot straight into LangGraph. This is the shape you'll use most:
from langgraph.graph import StateGraph, END
from langchain_doubleword import ChatDoublewordBatch
llm = ChatDoublewordBatch(
model="Qwen/Qwen3-14B-FP8",
completion_window="1h", # faster turnaround than the 24h default
batch_window_seconds=2.5, # don't make callers wait the full 10s
)
async def call_model(state):
return {"messages": [await llm.ainvoke(state["messages"])]}
graph = StateGraph(dict)
graph.add_node("model", call_model)
graph.set_entry_point("model")
graph.add_edge("model", END)
app = graph.compile()When several model nodes execute in parallel — via Send, conditional fan-out, or concurrent ainvoke calls from your own code — all their requests hit the same autobatcher window and get bundled together.
Try it end-to-end
A full multi-agent research workflow lives in the repo at examples/async-agents-langgraph/notebook.ipynb. It fans out a research query across sub-topics, runs Serper + Jina search-and-scrape inside each sub-agent, and aggregates into a final report. Every LLM call goes through ChatDoublewordBatch, so dozens of concurrent sub-agents collapse into a handful of batch submissions.
It's the cleanest demonstration of the shape that makes the batched variants worthwhile, and a good place to see the integration in real use.
Further reading
- Package README on GitHub — full API reference, configuration table, and smoke-test scripts under
examples/langgraph-basic/(a real-time and a batched variant of the same trivial LangGraph agent). autobatcher— the library powering the batched variants. Standalone, usable without LangChain.- Doubleword batch API docs — pricing, completion windows, and the raw endpoints the batched variants target.