LangChain / LangGraph | Doubleword Inference API

The langchain-doubleword package lets you use Doubleword as the model backend for any LangChain or LangGraph application. You get four drop-in classes — real-time and batched variants of chat and embeddings — that slot into agents, chains, graphs, retrievers, or anywhere else a LangChain BaseChatModel or Embeddings is accepted.

The batched variants are the headline feature. They let a LangGraph workflow fan out dozens or hundreds of parallel LLM calls and have them transparently collected into a single Doubleword batch submission — up to 90% lower per-token cost, no changes to your graph code.

Install

pip install langchain-doubleword

Authenticate

Three options, pick whichever fits. Credentials resolve in this order:

Pass the key directly — ChatDoubleword(model=..., api_key="{{apiKey}}"). Use for quick scripts and tests.
DOUBLEWORD_API_KEY environment variable — the right choice for production, CI, and containers.
The dw CLI — run dw login once and every script on the machine picks up your active account's inference key from ~/.dw/. The smoothest option for local development.

No separate config is needed — whichever class you instantiate finds its credentials automatically.

Real-time chat

Interactive calls go through Doubleword's OpenAI-compatible chat endpoint. This is what you want for a chatbot, an interactive agent, or anything latency-sensitive.

from langchain_doubleword import ChatDoubleword

llm = ChatDoubleword(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)

print(llm.invoke("Explain bismuth in three sentences.").content)

It's a standard BaseChatModel, so bind_tools, with_structured_output, streaming, and LangSmith tracing all work unchanged.

Tool calling

ChatDoubleword supports tool calling via bind_tools. Define tools with @tool, bind them, and let the model decide when to use them:

from langchain_core.messages import HumanMessage, ToolMessage
from langchain_core.tools import tool
from langchain_doubleword import ChatDoubleword

@tool
def calculator(expression: str) -> str:
    """Evaluate a basic arithmetic expression."""
    return str(eval(expression, {"__builtins__": {}}, {}))

llm = ChatDoubleword(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)
bound = llm.bind_tools([calculator])

messages = [HumanMessage(content="What is 137 * 49?")]
response = bound.invoke(messages)

# If the model called a tool, execute it and feed the result back
if response.tool_calls:
    messages.append(response)
    for tc in response.tool_calls:
        result = calculator.invoke(tc["args"])
        messages.append(ToolMessage(content=str(result), tool_call_id=tc["id"]))
    response = bound.invoke(messages)

print(response.content)

This pattern — invoke, check for tool calls, execute, feed back — is what LangGraph automates with its conditional edges. See the repo's examples/langgraph-basic/ for the full graph version.

Batched chat

For bulk work or agents that fan out, swap ChatDoubleword for ChatDoublewordBatch. The interface is identical — the only user-visible difference is it's async-only.

import asyncio
from langchain_doubleword import ChatDoublewordBatch

llm = ChatDoublewordBatch(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
)

async def main():
    results = await asyncio.gather(*[
        llm.ainvoke(f"Summarise chapter {i}") for i in range(50)
    ])
    for r in results:
        print(r.content)

asyncio.run(main())

Those 50 concurrent ainvoke calls get collected into one batch submission. The pricing follows Doubleword's batch tier (up to 90% less than real-time). The cost you pay for that: a small window-of-wait tax on the first call, tunable via batch_window_seconds.

Pick ChatDoublewordBatch when:

Your LangGraph workflow runs parallel branches or Send fan-out and you want batch pricing without rewriting the graph.
The model you need is only exposed via Doubleword's batch API (some of the larger Doubleword-hosted models fall into this category).
You're embedding or processing a large corpus offline.

For embeddings, the same pattern applies: DoublewordEmbeddings is real-time, DoublewordEmbeddingsBatch is transparently batched.

Inside a LangGraph node

Both chat classes slot straight into LangGraph. This is the shape you'll use most:

from langgraph.graph import StateGraph, END
from langchain_doubleword import ChatDoublewordBatch

llm = ChatDoublewordBatch(
    model="{{selectedModel.id}}",
    api_key="{{apiKey}}",
    batch_window_seconds=2.5,   # don't make callers wait the full 10s
)

async def call_model(state):
    return {"messages": [await llm.ainvoke(state["messages"])]}

graph = StateGraph(dict)
graph.add_node("model", call_model)
graph.set_entry_point("model")
graph.add_edge("model", END)
app = graph.compile()

When several model nodes execute in parallel — via Send, conditional fan-out, or concurrent ainvoke calls from your own code — all their requests hit the same autobatcher window and get bundled together.

Try it end-to-end

Two sets of examples live in the repo:

examples/langgraph-basic/ — a minimal LangGraph agent with a calculator tool, in both real-time and batched variants. The cleanest place to see tool calling in action.
examples/async-agents-langgraph/notebook.ipynb — a full multi-agent research workflow that fans out sub-agents with Serper + Jina search-and-scrape, all through ChatDoublewordBatch.