Building an Agentic PR Review Bot | Doubleword Inference API

Code review is one of those jobs where humans don't sit watching the spinner — what matters is that the verdict lands within a few minutes; anything under about ten minutes is fine. This makes it a perfect fit for async inference: trade a little latency for cost, and spend the savings on more research per review.

This workbook builds a self-hostable PR review bot on top of opencode and Doubleword's async inference. The bot listens for pull_request webhooks, clones the PR into an isolated workspace, runs an agentic loop, and posts a structured review back to GitHub.

Why async for code review

Tier	Trade-off
Realtime (default `service_tier`)	Lowest latency, highest cost
Async (`service_tier=flex`)	A few minutes of extra latency, ~2× cheaper
Batch (JSONL upload)	Deepest discount, but queued in bulk over hours

For a PR review bot, async is the sweet spot:

The review is allowed to take a few minutes. A human cares about the result landing in under ten minutes.
You can afford to be more thorough. Same budget → more research, more passes over the diff.
The loop needs sequential calls. Each turn depends on the last, so batch tier might stretch a single review into days. Async flex keeps the loop tight while still capturing most of the discount.

Cost & throughput: async vs realtime

Async inference is the same model on the same hardware — what changes is the delivery window.

This PR-review harness runs roughly 25 agent loop iterations per review (varies with PR size). At ~25 iterations on Qwen/Qwen3.5-397B-A17B-FP8 we use ~1.09M prompt tokens and produce ~12.5K completion tokens.

Provider / Tier	Input ($/1M)	Output ($/1M)	Cost for this review
Doubleword Async (`flex`)	$0.30	$0.60	$0.34
Doubleword Realtime	$0.60	$1.20	$0.67
Anthropic Claude Sonnet 4.6 (realtime)	$3.00	$15.00	$3.46
OpenAI GPT-5.5 (realtime)	$5.00	$30.00	$5.84

Inference wall-clock at ~25 iter is 2m 38s on Doubleword Realtime and 4m 53s on Doubleword Async — both comfortably under the 10-minute ceiling a human cares about.

Architecture

GitHub PR ── webhook ──► shim (HMAC-verified)
                           │
                           ├─ clone PR into /tmp/pr-<id>
                           ├─ POST /session → opencode (per-PR x-opencode-directory)
                           └─ poll /session/:id/message until complete
                                       │
                                       ▼
                              opencode agent loop
                                       │
                          ┌────────────┴────────────┐
                          ▼                         ▼
                  Doubleword async              client-side tools
                  inference (flex/bg)           (read, grep, bash, webfetch)

Our opencode server handles concurrent PRs by directory isolation. Each request carries an x-opencode-directory header pointing at the cloned PR worktree, so opencode loads a per-request workspace without needing one container per PR.

The tool-calling loop

opencode runs a standard client-side tool loop: the model emits tool calls in its response, the client executes them, results are appended to the context, and the next turn fires. Nothing leaves the bot's process — the agent reads files from your filesystem, runs git diff in a subprocess, and fetches URLs from your network.

For the review agent, we restrict the tool set to read-only operations:

"permission": {
  "*": "deny",
  "read": "allow",
  "grep": "allow",
  "glob": "allow",
  "list": "allow",
  "bash": "allow",
  "webfetch": "allow"
}

The agent's system prompt forces it to:

Map the change with git log / git diff.
Read changed files in full, plus their callers, tests, and types.
Research best practices — webfetch docs, OWASP & CVE guidance, migration notes, etc.
Cross-reference patterns inside the repo with grep.
Iterate — multiple passes, not single-shot.
Emit a JSON review with summary plus inline comments anchored to file:line.

The webfetch step is where async inference pays off the most. A realtime budget might allow two or three fetches per review; async lets the agent take its time and pull in five or ten authoritative sources without breaking the cost ceiling.

Wiring Doubleword as the inference provider

opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "doubleword": {
      "npm": "@doubleword/vercel-ai",
      "name": "Doubleword (async / flex)",
      "env": ["DOUBLEWORD_API_KEY"],
      "options": {
        "baseURL": "https://api.doubleword.ai/v1",
        "apiKey": "{env:DOUBLEWORD_API_KEY}"
      },
      "models": {
        "Qwen/Qwen3.5-397B-A17B-FP8": {
          "name": "Qwen3.5 397B",
          "tool_call": true,
          "limit": { "context": 128000, "output": 16384 }
        }
      }
    }
  }
}

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "doubleword": {
      "npm": "file:///app/doubleword-responses-wrapper/index.js",
      "name": "Doubleword (Responses + flex + background)",
      "env": ["DOUBLEWORD_API_KEY"],
      "options": {
        "baseURL": "https://api.doubleword.ai/v1",
        "apiKey": "{env:DOUBLEWORD_API_KEY}"
      },
      "models": {
        "Qwen/Qwen3.5-397B-A17B-FP8": {
          "name": "Qwen3.5 397B",
          "tool_call": true,
          "limit": { "context": 128000, "output": 16384 }
        }
      }
    }
  }
}

Provider implementation

The simplest approach is to use createDoublewordAsync directly — the factory submits async requests and polls to completion internally. Alternatively, look at the Open Responses code samples for a custom Vercel AI SDK provider that implements its own polling.

// doubleword-async-wrapper/src/index.ts
import { createDoublewordAsync } from "@doubleword/vercel-ai";
import type {
  LanguageModelV3,
  LanguageModelV3CallOptions,
  LanguageModelV3StreamPart,
} from "@ai-sdk/provider";

export function createDoubleword(opts: {
  apiKey: string;
  baseURL: string;
}) {
  const base = createDoublewordAsync(opts);

  const wrapModel = (inner: LanguageModelV3): LanguageModelV3 => ({
    ...inner,
    doStream: async (options: LanguageModelV3CallOptions) => {
      const result = await inner.doGenerate(options);
      const parts: LanguageModelV3StreamPart[] = [];
      // Emit text content as start/delta/end triplets.
      for (const c of result.content) {
        if (c.type === "text") {
          const id = crypto.randomUUID();
          parts.push({ type: "text-start", id });
          parts.push({ type: "text-delta", id, delta: c.text });
          parts.push({ type: "text-end", id });
        } else {
          parts.push(c as LanguageModelV3StreamPart);
        }
      }
      parts.push({
        type: "finish",
        finishReason: result.finishReason,
        usage: result.usage,
      });
      return {
        stream: new ReadableStream({
          start(ctrl) {
            for (const p of parts) ctrl.enqueue(p);
            ctrl.close();
          },
        }),
      };
    },
  });

  return {
    languageModel: (id: string) => wrapModel(base.languageModel(id)),
  };
}

// doubleword-responses-wrapper/src/index.ts
import { createOpenAI } from "@ai-sdk/openai";

const TERMINAL = new Set([
  "completed", "failed", "incomplete",
  "cancelled", "canceled", "expired",
]);

export function createDoubleword(opts: {
  apiKey: string;
  baseURL: string;
  pollIntervalMs?: number;
  pollTimeoutMs?: number;
}) {
  const pollIntervalMs = opts.pollIntervalMs ?? 2000;
  const pollTimeoutMs = opts.pollTimeoutMs ?? 60 * 60 * 1000;

  const wrappedFetch: typeof fetch = async (input, init) => {
    const url = typeof input === "string" ? input : (input as Request).url;
    const isResponsesPost =
      init?.method === "POST" &&
      url.includes("/responses") &&
      !url.match(/\/responses\/[^/?]+/);

    if (!isResponsesPost || typeof init?.body !== "string") {
      return fetch(input as RequestInfo, init);
    }

    // 1. Add flex + background flags, drop incompatible stream:true.
    const body = JSON.parse(init.body);
    body.service_tier = "flex";
    body.background = true;
    delete body.stream;

    // 2. Submit → 202 + { id, status: "queued" }.
    const submit = await fetch(input as RequestInfo, {
      ...init,
      body: JSON.stringify(body),
    });
    const submitJson = await submit.clone().json();
    const responseId = submitJson?.id;
    if (!responseId) return submit;

    // 3. Poll GET /v1/responses/{id} until terminal status.
    const pollUrl = `${url.split("?")[0]}/${responseId}`;
    const headers: Record<string, string> = {};
    new Headers(init.headers).forEach((v, k) => {
      if (k.toLowerCase() !== "content-type") headers[k] = v;
    });

    const deadline = Date.now() + pollTimeoutMs;
    let last = submitJson;
    while (Date.now() < deadline) {
      if (TERMINAL.has(last?.status)) break;
      await new Promise((r) => setTimeout(r, pollIntervalMs));
      const res = await fetch(pollUrl, { method: "GET", headers });
      if (!res.ok) return res;
      last = await res.json();
    }

    // 4. Synthesize a 200 (or 502 on non-completed terminal) so the
    //    Vercel AI SDK sees the same response shape it expects.
    const ok = last?.status === "completed";
    return new Response(JSON.stringify(last), {
      status: ok ? 200 : 502,
      headers: { "content-type": "application/json" },
    });
  };

  const provider = createOpenAI({ ...opts, fetch: wrappedFetch });
  return {
    languageModel: (id: string) => provider.responses(id),
  };
}

Calling the model

import { generateText } from "ai";
import { createDoublewordAsync } from "@doubleword/vercel-ai";

const doubleword = createDoublewordAsync({
  apiKey: process.env.DOUBLEWORD_API_KEY!,
  baseURL: "https://api.doubleword.ai/v1",
});

const result = await generateText({
  model: doubleword("Qwen/Qwen3.5-397B-A17B-FP8"),
  tools: { /* read, grep, bash, webfetch, ... */ },
  prompt: "Review the PR at the current working directory.",
});

console.log(result.text);

import { generateText } from "ai";
import { createDoubleword } from "./doubleword-responses-wrapper";

const doubleword = createDoubleword({
  apiKey: process.env.DOUBLEWORD_API_KEY!,
  baseURL: "https://api.doubleword.ai/v1",
});

const result = await generateText({
  model: doubleword.languageModel("Qwen/Qwen3.5-397B-A17B-FP8"),
  tools: { /* read, grep, bash, webfetch, ... */ },
  prompt: "Review the PR at the current working directory.",
});

console.log(result.text);

Tip

Reach for the Open Responses path if you want to implement your own 202-and-poll pattern.

The GitHub integration

A small Bun shim sits in front of opencode and handles GitHub. It does four things:

HMAC-verifies the webhook, rejects anything without a valid x-hub-signature-256.
Clones the PR branch into a per-request temp directory using a GitHub App installation token.
Dispatches to opencode asynchronously (POST /session + POST /session/:id/prompt_async) and polls /session/:id/message until the assistant message has time.completed set.
Posts the review via octokit.rest.pulls.createReview, with a pre-validation step that drops inline comments whose (path, line, side) aren't in the diff hunks.

Info

GitHub rejects the entire review with HTTP 422 if any inline comment is anchored to a line outside the diff. The shim parses the PR's file patches, builds a (path, side, line) → diff-text map, and demotes any inline finding that doesn't match into the markdown summary — so a single stale line ref no longer loses the other 10 valid findings.

// pr-review-shim — the webhook entry point
async function handleHttp(req: Request): Promise<Response> {
  const body = await req.text();
  const sig = req.headers.get("x-hub-signature-256");
  if (!sig || !verifySignature(body, sig)) {
    return new Response("invalid signature", { status: 401 });
  }
  if (req.headers.get("x-github-event") !== "pull_request") {
    return new Response("ignored", { status: 200 });
  }
  const event = JSON.parse(body) as PullRequestEvent;
  if (!["opened", "synchronize", "reopened"].includes(event.action)) {
    return new Response("ignored", { status: 200 });
  }
  // Fire-and-forget — GitHub retries any webhook that takes too long.
  runReview(event).catch((err) => console.error("review failed", err));
  return new Response("queued", { status: 202 });
}

The PR review itself goes through x-opencode-directory, which tells opencode to load its workspace from the freshly-cloned PR worktree:

const session = await fetch(`${opencodeUrl}/session`, {
  method: "POST",
  headers: {
    "content-type": "application/json",
    "x-opencode-directory": workdir, // /tmp/pr-<id>
    authorization: opencodeAuth,
  },
  body: JSON.stringify({ title: `PR #${prNumber} review` }),
}).then((r) => r.json());

await fetch(`${opencodeUrl}/session/${session.id}/prompt_async`, {
  method: "POST",
  headers: { /* same headers including x-opencode-directory */ },
  body: JSON.stringify({
    agent: "review",
    model: { providerID: "doubleword", modelID: "Qwen/Qwen3.5-397B-A17B-FP8" },
    parts: [{ type: "text", text: reviewPrompt }],
  }),
});

Polling the session is the same pattern as polling a /v1/responses background job — every five seconds, hit GET /session/:id/message, look for the most recent assistant message, and check whether time.completed is set or info.error is populated.

Note

The prompt_async + polling pattern is symmetric across the stack: opencode itself uses it so the agent loop isn't bottlenecked on a single long-held HTTP request, and the Open Responses wrapper uses it so the inference call isn't bottlenecked on a long-held connection to Doubleword. Async all the way down.

Deployment

The bot is a single Bun process (the shim) that spawns a child opencode server — runs anywhere with outbound HTTPS, a webhook-reachable port, and ~1 GB of memory. The Dockerfile is at packages/pr-review-shim/Dockerfile and expects the repo root as build context.

1. Get a Doubleword API key

Sign in at app.doubleword.ai → API Keys → create a new key. Copy it somewhere safe — we'll plug it into the deploy step. Pick a model from the Models catalogue too (the card shows which transports it supports) and make a note of its alias, e.g. Qwen/Qwen3.5-397B-A17B-FP8.

2. Create a GitHub App

The bot authenticates as a GitHub App when it clones PRs and posts reviews — installation tokens give finer-grained access than a PAT and rotate automatically. By the end of this step you'll have collected five values to feed the bot in step 4.

In Settings → Developer settings → GitHub Apps → New GitHub App:

Homepage URL: anything (your repo is fine).
Webhook URL: leave blank for now — you'll fill it in once the bot is deployed.
Webhook secret: generate a random string and make a note of it.
Repository permissions:
- Pull requests: Read and write
- Contents: Read-only
Subscribe to events: tick Pull request.

Create the app. From the General page:

Note the App ID — it's a small integer near the top of the page.
Scroll to Private keys → Generate a private key → download the .pem file and keep it somewhere safe.

Then Install App on the org or repo you want reviewed. After install, the browser URL is https://github.com/settings/installations/<id> — note that <id>, it's your installation ID.

One more value to invent: generate any random string to use as the opencode server password (it protects the in-container opencode HTTP server). Keep it with the rest.

3. Build the image

The binaries are compiled locally with bun and then COPY-ed into a slim alpine base — the image itself doesn't run bun install, which keeps the final layer small. From the repo root:

# 1. Compile the opencode and pr-review-shim binaries for linux-x64-musl
(cd packages/opencode && bun run build)
(cd packages/pr-review-shim && bun run build)

# 2. Build the doubleword-async-wrapper bundle (chat completions on flex)
(cd packages/doubleword-async-wrapper && bun run build)

# 3. Build the image (context = repo root)
docker build --platform linux/amd64 \
  -f packages/pr-review-shim/Dockerfile \
  -t pr-review-harness .

4. Push and deploy

Two low-friction targets — pick whichever fits your platform. Substitute the bracketed placeholders with the values you saved in steps 1–2.

# Tag and push to Artifact Registry
docker tag pr-review-harness \
  europe-west4-docker.pkg.dev/[gcp-project]/[ar-repo]/harness:latest
docker push \
  europe-west4-docker.pkg.dev/[gcp-project]/[ar-repo]/harness:latest

# Store the multi-line GitHub private key in Secret Manager
gcloud secrets create github-private-key \
  --data-file=[path/to/github-app.pem]

# Write non-sensitive env vars to a YAML file (each var on its own line)
cat > env.yaml <<'EOF'
DOUBLEWORD_API_KEY: [doubleword-api-key]
GITHUB_APP_ID: [github-app-id]
GITHUB_INSTALLATION_ID: [github-installation-id]
GITHUB_WEBHOOK_SECRET: [github-webhook-secret]
OPENCODE_SERVER_PASSWORD: [random-server-password]
REVIEW_MODEL_ID: Qwen/Qwen3.5-397B-A17B-FP8
EOF

# Deploy
gcloud run deploy pr-review-harness \
  --image=europe-west4-docker.pkg.dev/[gcp-project]/[ar-repo]/harness:latest \
  --region=europe-west4 \
  --env-vars-file=env.yaml \
  --set-secrets=GITHUB_PRIVATE_KEY=github-private-key:latest

# From the repo root (after running the three local builds above)
fly launch \
  --dockerfile packages/pr-review-shim/Dockerfile \
  --no-deploy

# Set each secret on its own line — fly secrets set accepts multiple KEY=value pairs
fly secrets set \
  DOUBLEWORD_API_KEY=[doubleword-api-key] \
  GITHUB_APP_ID=[github-app-id] \
  GITHUB_INSTALLATION_ID=[github-installation-id] \
  GITHUB_PRIVATE_KEY="$(cat [path/to/github-app.pem])" \
  GITHUB_WEBHOOK_SECRET=[github-webhook-secret] \
  OPENCODE_SERVER_PASSWORD=[random-server-password] \
  REVIEW_MODEL_ID=Qwen/Qwen3.5-397B-A17B-FP8

fly deploy

5. Register the webhook

After the deploy returns a public HTTPS URL, paste <service-url>/webhook into the GitHub App's Webhook URL field and save. Open or reopen a PR on the repo where the app is installed — the bot will review it, and every model call shows up in Responses on Doubleword with status, latency, tokens, and cost.

Async inference on Doubleword — concepts and pricing for flex and batch tiers.
Open Responses API reference — full spec for /v1/responses including service_tier and background.
@doubleword/vercel-ai — Vercel AI SDK provider that runs every call on service_tier=flex.
opencode — TypeScript agentic CLI, native HTTP server, the substrate this bot is built on.