DoublewordDoubleword

LangChain / LangGraph

The langchain-doubleword package lets you use Doubleword as the model backend for any LangChain or LangGraph application. You get four drop-in classes — real-time and batched variants of chat and embeddings — that slot into agents, chains, graphs, retrievers, or anywhere else a LangChain BaseChatModel or Embeddings is accepted.

The batched variants are the headline feature. They let a LangGraph workflow fan out dozens or hundreds of parallel LLM calls and have them transparently collected into a single Doubleword batch submission — half the per-token cost, no changes to your graph code.

Install

pip install langchain-doubleword

Authenticate

Three options, pick whichever fits. Credentials resolve in this order:

  1. Pass the key directlyChatDoubleword(model=..., api_key="sk-..."). Use for quick scripts and tests.
  2. DOUBLEWORD_API_KEY environment variable — the right choice for production, CI, and containers.
  3. The dw CLI — run dw login once and every script on the machine picks up your active account's inference key from ~/.dw/. The smoothest option for local development.

No separate config is needed — whichever class you instantiate finds its credentials automatically.

Real-time chat

Interactive calls go through Doubleword's OpenAI-compatible chat endpoint. This is what you want for a chatbot, an interactive agent, or anything latency-sensitive.

from langchain_doubleword import ChatDoubleword

llm = ChatDoubleword(model="Qwen/Qwen3-14B-FP8")

print(llm.invoke("Explain bismuth in three sentences.").content)

It's a standard BaseChatModel, so bind_tools, with_structured_output, streaming, and LangSmith tracing all work unchanged.

Batched chat

For bulk work or agents that fan out, swap ChatDoubleword for ChatDoublewordBatch. The interface is identical — the only user-visible difference is it's async-only.

import asyncio
from langchain_doubleword import ChatDoublewordBatch

llm = ChatDoublewordBatch(model="Qwen/Qwen3-14B-FP8")

async def main():
    results = await asyncio.gather(*[
        llm.ainvoke(f"Summarise chapter {i}") for i in range(50)
    ])
    for r in results:
        print(r.content)

asyncio.run(main())

Those 50 concurrent ainvoke calls get collected into one batch submission. The pricing follows Doubleword's batch tier (roughly half of real-time). The cost you pay for that: a small window-of-wait tax on the first call, tunable via batch_window_seconds.

Pick ChatDoublewordBatch when:

  • Your LangGraph workflow runs parallel branches or Send fan-out and you want batch pricing without rewriting the graph.
  • The model you need is only exposed via Doubleword's batch API (some of the larger Doubleword-hosted models fall into this category).
  • You're embedding or processing a large corpus offline.

For embeddings, the same pattern applies: DoublewordEmbeddings is real-time, DoublewordEmbeddingsBatch is transparently batched.

Inside a LangGraph node

Both chat classes slot straight into LangGraph. This is the shape you'll use most:

from langgraph.graph import StateGraph, END
from langchain_doubleword import ChatDoublewordBatch

llm = ChatDoublewordBatch(
    model="Qwen/Qwen3-14B-FP8",
    completion_window="1h",     # faster turnaround than the 24h default
    batch_window_seconds=2.5,   # don't make callers wait the full 10s
)

async def call_model(state):
    return {"messages": [await llm.ainvoke(state["messages"])]}

graph = StateGraph(dict)
graph.add_node("model", call_model)
graph.set_entry_point("model")
graph.add_edge("model", END)
app = graph.compile()

When several model nodes execute in parallel — via Send, conditional fan-out, or concurrent ainvoke calls from your own code — all their requests hit the same autobatcher window and get bundled together.

Try it end-to-end

A full multi-agent research workflow lives in the repo at examples/async-agents-langgraph/notebook.ipynb. It fans out a research query across sub-topics, runs Serper + Jina search-and-scrape inside each sub-agent, and aggregates into a final report. Every LLM call goes through ChatDoublewordBatch, so dozens of concurrent sub-agents collapse into a handful of batch submissions.

It's the cleanest demonstration of the shape that makes the batched variants worthwhile, and a good place to see the integration in real use.

Further reading

  • Package README on GitHub — full API reference, configuration table, and smoke-test scripts under examples/langgraph-basic/ (a real-time and a batched variant of the same trivial LangGraph agent).
  • autobatcher — the library powering the batched variants. Standalone, usable without LangChain.
  • Doubleword batch API docs — pricing, completion windows, and the raw endpoints the batched variants target.