Building an Agentic PR Review Bot
Code review is one of those jobs where humans don't sit watching the spinner — what matters is that the verdict lands within a few minutes; anything under about ten minutes is fine. This makes it a perfect fit for async inference: trade a little latency for cost, and spend the savings on more research per review.
This workbook builds a self-hostable PR review bot on top of opencode and Doubleword's async inference. The bot listens for pull_request webhooks, clones the PR into an isolated workspace, runs an agentic loop, and posts a structured review back to GitHub.
Why async for code review
| Tier | Trade-off |
|---|---|
Realtime (default service_tier) | Lowest latency, highest cost |
Async (service_tier=flex) | A few minutes of extra latency, ~2× cheaper |
| Batch (JSONL upload) | Deepest discount, but queued in bulk over hours |
For a PR review bot, async is the sweet spot:
- The review is allowed to take a few minutes. A human cares about the result landing in under ten minutes.
- You can afford to be more thorough. Same budget → more research, more passes over the diff.
- The loop needs sequential calls. Each turn depends on the last, so batch tier might stretch a single review into days. Async flex keeps the loop tight while still capturing most of the discount.
Cost & throughput: async vs realtime
Async inference is the same model on the same hardware — what changes is the delivery window.
This PR-review harness runs roughly 25 agent loop iterations per review (varies with PR size). At ~25 iterations on Qwen/Qwen3.5-397B-A17B-FP8 we use ~1.09M prompt tokens and produce ~12.5K completion tokens.
| Provider / Tier | Input ($/1M) | Output ($/1M) | Cost for this review |
|---|---|---|---|
Doubleword Async (flex) | $0.30 | $0.60 | $0.34 |
| Doubleword Realtime | $0.60 | $1.20 | $0.67 |
| Anthropic Claude Sonnet 4.6 (realtime) | $3.00 | $15.00 | $3.46 |
| OpenAI GPT-5.5 (realtime) | $5.00 | $30.00 | $5.84 |
Inference wall-clock at ~25 iter is 2m 38s on Doubleword Realtime and 4m 53s on Doubleword Async — both comfortably under the 10-minute ceiling a human cares about.
Architecture
GitHub PR ── webhook ──► shim (HMAC-verified)
│
├─ clone PR into /tmp/pr-<id>
├─ POST /session → opencode (per-PR x-opencode-directory)
└─ poll /session/:id/message until complete
│
▼
opencode agent loop
│
┌────────────┴────────────┐
▼ ▼
Doubleword async client-side tools
inference (flex/bg) (read, grep, bash, webfetch)Our opencode server handles concurrent PRs by directory isolation. Each request carries an x-opencode-directory header pointing at the cloned PR worktree, so opencode loads a per-request workspace without needing one container per PR.
The tool-calling loop
opencode runs a standard client-side tool loop: the model emits tool calls in its response, the client executes them, results are appended to the context, and the next turn fires. Nothing leaves the bot's process — the agent reads files from your filesystem, runs git diff in a subprocess, and fetches URLs from your network.
For the review agent, we restrict the tool set to read-only operations:
"permission": {
"*": "deny",
"read": "allow",
"grep": "allow",
"glob": "allow",
"list": "allow",
"bash": "allow",
"webfetch": "allow"
}The agent's system prompt forces it to:
- Map the change with
git log/git diff. - Read changed files in full, plus their callers, tests, and types.
- Research best practices —
webfetchdocs, OWASP & CVE guidance, migration notes, etc. - Cross-reference patterns inside the repo with
grep. - Iterate — multiple passes, not single-shot.
- Emit a JSON review with
summaryplus inlinecommentsanchored tofile:line.
The webfetch step is where async inference pays off the most. A realtime budget might allow two or three fetches per review; async lets the agent take its time and pull in five or ten authoritative sources without breaking the cost ceiling.
Wiring Doubleword as the inference provider
opencode.json
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"doubleword": {
"npm": "@doubleword/vercel-ai",
"name": "Doubleword (async / flex)",
"env": ["DOUBLEWORD_API_KEY"],
"options": {
"baseURL": "https://api.doubleword.ai/v1",
"apiKey": "{env:DOUBLEWORD_API_KEY}"
},
"models": {
"Qwen/Qwen3.5-397B-A17B-FP8": {
"name": "Qwen3.5 397B",
"tool_call": true,
"limit": { "context": 128000, "output": 16384 }
}
}
}
}
}{
"$schema": "https://opencode.ai/config.json",
"provider": {
"doubleword": {
"npm": "file:///app/doubleword-responses-wrapper/index.js",
"name": "Doubleword (Responses + flex + background)",
"env": ["DOUBLEWORD_API_KEY"],
"options": {
"baseURL": "https://api.doubleword.ai/v1",
"apiKey": "{env:DOUBLEWORD_API_KEY}"
},
"models": {
"Qwen/Qwen3.5-397B-A17B-FP8": {
"name": "Qwen3.5 397B",
"tool_call": true,
"limit": { "context": 128000, "output": 16384 }
}
}
}
}
}Provider implementation
The simplest approach is to use createDoublewordAsync directly — the factory submits async requests and polls to completion internally. Alternatively, look at the Open Responses code samples for a custom Vercel AI SDK provider that implements its own polling.
// doubleword-async-wrapper/src/index.ts
import { createDoublewordAsync } from "@doubleword/vercel-ai";
import type {
LanguageModelV3,
LanguageModelV3CallOptions,
LanguageModelV3StreamPart,
} from "@ai-sdk/provider";
export function createDoubleword(opts: {
apiKey: string;
baseURL: string;
}) {
const base = createDoublewordAsync(opts);
const wrapModel = (inner: LanguageModelV3): LanguageModelV3 => ({
...inner,
doStream: async (options: LanguageModelV3CallOptions) => {
const result = await inner.doGenerate(options);
const parts: LanguageModelV3StreamPart[] = [];
// Emit text content as start/delta/end triplets.
for (const c of result.content) {
if (c.type === "text") {
const id = crypto.randomUUID();
parts.push({ type: "text-start", id });
parts.push({ type: "text-delta", id, delta: c.text });
parts.push({ type: "text-end", id });
} else {
parts.push(c as LanguageModelV3StreamPart);
}
}
parts.push({
type: "finish",
finishReason: result.finishReason,
usage: result.usage,
});
return {
stream: new ReadableStream({
start(ctrl) {
for (const p of parts) ctrl.enqueue(p);
ctrl.close();
},
}),
};
},
});
return {
languageModel: (id: string) => wrapModel(base.languageModel(id)),
};
}// doubleword-responses-wrapper/src/index.ts
import { createOpenAI } from "@ai-sdk/openai";
const TERMINAL = new Set([
"completed", "failed", "incomplete",
"cancelled", "canceled", "expired",
]);
export function createDoubleword(opts: {
apiKey: string;
baseURL: string;
pollIntervalMs?: number;
pollTimeoutMs?: number;
}) {
const pollIntervalMs = opts.pollIntervalMs ?? 2000;
const pollTimeoutMs = opts.pollTimeoutMs ?? 60 * 60 * 1000;
const wrappedFetch: typeof fetch = async (input, init) => {
const url = typeof input === "string" ? input : (input as Request).url;
const isResponsesPost =
init?.method === "POST" &&
url.includes("/responses") &&
!url.match(/\/responses\/[^/?]+/);
if (!isResponsesPost || typeof init?.body !== "string") {
return fetch(input as RequestInfo, init);
}
// 1. Add flex + background flags, drop incompatible stream:true.
const body = JSON.parse(init.body);
body.service_tier = "flex";
body.background = true;
delete body.stream;
// 2. Submit → 202 + { id, status: "queued" }.
const submit = await fetch(input as RequestInfo, {
...init,
body: JSON.stringify(body),
});
const submitJson = await submit.clone().json();
const responseId = submitJson?.id;
if (!responseId) return submit;
// 3. Poll GET /v1/responses/{id} until terminal status.
const pollUrl = `${url.split("?")[0]}/${responseId}`;
const headers: Record<string, string> = {};
new Headers(init.headers).forEach((v, k) => {
if (k.toLowerCase() !== "content-type") headers[k] = v;
});
const deadline = Date.now() + pollTimeoutMs;
let last = submitJson;
while (Date.now() < deadline) {
if (TERMINAL.has(last?.status)) break;
await new Promise((r) => setTimeout(r, pollIntervalMs));
const res = await fetch(pollUrl, { method: "GET", headers });
if (!res.ok) return res;
last = await res.json();
}
// 4. Synthesize a 200 (or 502 on non-completed terminal) so the
// Vercel AI SDK sees the same response shape it expects.
const ok = last?.status === "completed";
return new Response(JSON.stringify(last), {
status: ok ? 200 : 502,
headers: { "content-type": "application/json" },
});
};
const provider = createOpenAI({ ...opts, fetch: wrappedFetch });
return {
languageModel: (id: string) => provider.responses(id),
};
}Calling the model
import { generateText } from "ai";
import { createDoublewordAsync } from "@doubleword/vercel-ai";
const doubleword = createDoublewordAsync({
apiKey: process.env.DOUBLEWORD_API_KEY!,
baseURL: "https://api.doubleword.ai/v1",
});
const result = await generateText({
model: doubleword("Qwen/Qwen3.5-397B-A17B-FP8"),
tools: { /* read, grep, bash, webfetch, ... */ },
prompt: "Review the PR at the current working directory.",
});
console.log(result.text);import { generateText } from "ai";
import { createDoubleword } from "./doubleword-responses-wrapper";
const doubleword = createDoubleword({
apiKey: process.env.DOUBLEWORD_API_KEY!,
baseURL: "https://api.doubleword.ai/v1",
});
const result = await generateText({
model: doubleword.languageModel("Qwen/Qwen3.5-397B-A17B-FP8"),
tools: { /* read, grep, bash, webfetch, ... */ },
prompt: "Review the PR at the current working directory.",
});
console.log(result.text);Reach for the Open Responses path if you want to implement your own 202-and-poll pattern.
The GitHub integration
A small Bun shim sits in front of opencode and handles GitHub. It does four things:
- HMAC-verifies the webhook, rejects anything without a valid
x-hub-signature-256. - Clones the PR branch into a per-request temp directory using a GitHub App installation token.
- Dispatches to opencode asynchronously (
POST /session+POST /session/:id/prompt_async) and polls/session/:id/messageuntil the assistant message hastime.completedset. - Posts the review via
octokit.rest.pulls.createReview, with a pre-validation step that drops inline comments whose(path, line, side)aren't in the diff hunks.
GitHub rejects the entire review with HTTP 422 if any inline comment is anchored to a line outside the diff. The shim parses the PR's file patches, builds a (path, side, line) → diff-text map, and demotes any inline finding that doesn't match into the markdown summary — so a single stale line ref no longer loses the other 10 valid findings.
// pr-review-shim — the webhook entry point
async function handleHttp(req: Request): Promise<Response> {
const body = await req.text();
const sig = req.headers.get("x-hub-signature-256");
if (!sig || !verifySignature(body, sig)) {
return new Response("invalid signature", { status: 401 });
}
if (req.headers.get("x-github-event") !== "pull_request") {
return new Response("ignored", { status: 200 });
}
const event = JSON.parse(body) as PullRequestEvent;
if (!["opened", "synchronize", "reopened"].includes(event.action)) {
return new Response("ignored", { status: 200 });
}
// Fire-and-forget — GitHub retries any webhook that takes too long.
runReview(event).catch((err) => console.error("review failed", err));
return new Response("queued", { status: 202 });
}The PR review itself goes through x-opencode-directory, which tells opencode to load its workspace from the freshly-cloned PR worktree:
const session = await fetch(`${opencodeUrl}/session`, {
method: "POST",
headers: {
"content-type": "application/json",
"x-opencode-directory": workdir, // /tmp/pr-<id>
authorization: opencodeAuth,
},
body: JSON.stringify({ title: `PR #${prNumber} review` }),
}).then((r) => r.json());
await fetch(`${opencodeUrl}/session/${session.id}/prompt_async`, {
method: "POST",
headers: { /* same headers including x-opencode-directory */ },
body: JSON.stringify({
agent: "review",
model: { providerID: "doubleword", modelID: "Qwen/Qwen3.5-397B-A17B-FP8" },
parts: [{ type: "text", text: reviewPrompt }],
}),
});Polling the session is the same pattern as polling a /v1/responses background job — every five seconds, hit GET /session/:id/message, look for the most recent assistant message, and check whether time.completed is set or info.error is populated.
The prompt_async + polling pattern is symmetric across the stack: opencode itself uses it so the agent loop isn't bottlenecked on a single long-held HTTP request, and the Open Responses wrapper uses it so the inference call isn't bottlenecked on a long-held connection to Doubleword. Async all the way down.
Deployment
The bot is a single Bun process (the shim) that spawns a child opencode server — runs anywhere with outbound HTTPS, a webhook-reachable port, and ~1 GB of memory. The Dockerfile is at packages/pr-review-shim/Dockerfile and expects the repo root as build context.
1. Get a Doubleword API key
Sign in at app.doubleword.ai → API Keys → create a new key. Copy it somewhere safe — we'll plug it into the deploy step. Pick a model from the Models catalogue too (the card shows which transports it supports) and make a note of its alias, e.g. Qwen/Qwen3.5-397B-A17B-FP8.
2. Create a GitHub App
The bot authenticates as a GitHub App when it clones PRs and posts reviews — installation tokens give finer-grained access than a PAT and rotate automatically. By the end of this step you'll have collected five values to feed the bot in step 4.
In Settings → Developer settings → GitHub Apps → New GitHub App:
- Homepage URL: anything (your repo is fine).
- Webhook URL: leave blank for now — you'll fill it in once the bot is deployed.
- Webhook secret: generate a random string and make a note of it.
- Repository permissions:
- Pull requests: Read and write
- Contents: Read-only
- Subscribe to events: tick Pull request.
Create the app. From the General page:
- Note the App ID — it's a small integer near the top of the page.
- Scroll to Private keys → Generate a private key → download the
.pemfile and keep it somewhere safe.
Then Install App on the org or repo you want reviewed. After install, the browser URL is https://github.com/settings/installations/<id> — note that <id>, it's your installation ID.
One more value to invent: generate any random string to use as the opencode server password (it protects the in-container opencode HTTP server). Keep it with the rest.
3. Build the image
The binaries are compiled locally with bun and then COPY-ed into a slim alpine base — the image itself doesn't run bun install, which keeps the final layer small. From the repo root:
# 1. Compile the opencode and pr-review-shim binaries for linux-x64-musl
(cd packages/opencode && bun run build)
(cd packages/pr-review-shim && bun run build)
# 2. Build the doubleword-async-wrapper bundle (chat completions on flex)
(cd packages/doubleword-async-wrapper && bun run build)
# 3. Build the image (context = repo root)
docker build --platform linux/amd64 \
-f packages/pr-review-shim/Dockerfile \
-t pr-review-harness .4. Push and deploy
Two low-friction targets — pick whichever fits your platform. Substitute the bracketed placeholders with the values you saved in steps 1–2.
# Tag and push to Artifact Registry
docker tag pr-review-harness \
europe-west4-docker.pkg.dev/[gcp-project]/[ar-repo]/harness:latest
docker push \
europe-west4-docker.pkg.dev/[gcp-project]/[ar-repo]/harness:latest
# Store the multi-line GitHub private key in Secret Manager
gcloud secrets create github-private-key \
--data-file=[path/to/github-app.pem]
# Write non-sensitive env vars to a YAML file (each var on its own line)
cat > env.yaml <<'EOF'
DOUBLEWORD_API_KEY: [doubleword-api-key]
GITHUB_APP_ID: [github-app-id]
GITHUB_INSTALLATION_ID: [github-installation-id]
GITHUB_WEBHOOK_SECRET: [github-webhook-secret]
OPENCODE_SERVER_PASSWORD: [random-server-password]
REVIEW_MODEL_ID: Qwen/Qwen3.5-397B-A17B-FP8
EOF
# Deploy
gcloud run deploy pr-review-harness \
--image=europe-west4-docker.pkg.dev/[gcp-project]/[ar-repo]/harness:latest \
--region=europe-west4 \
--env-vars-file=env.yaml \
--set-secrets=GITHUB_PRIVATE_KEY=github-private-key:latest# From the repo root (after running the three local builds above)
fly launch \
--dockerfile packages/pr-review-shim/Dockerfile \
--no-deploy
# Set each secret on its own line — fly secrets set accepts multiple KEY=value pairs
fly secrets set \
DOUBLEWORD_API_KEY=[doubleword-api-key] \
GITHUB_APP_ID=[github-app-id] \
GITHUB_INSTALLATION_ID=[github-installation-id] \
GITHUB_PRIVATE_KEY="$(cat [path/to/github-app.pem])" \
GITHUB_WEBHOOK_SECRET=[github-webhook-secret] \
OPENCODE_SERVER_PASSWORD=[random-server-password] \
REVIEW_MODEL_ID=Qwen/Qwen3.5-397B-A17B-FP8
fly deploy5. Register the webhook
After the deploy returns a public HTTPS URL, paste <service-url>/webhook into the GitHub App's Webhook URL field and save. Open or reopen a PR on the repo where the app is installed — the bot will review it, and every model call shows up in Responses on Doubleword with status, latency, tokens, and cost.
Related reading
- Async inference on Doubleword — concepts and pricing for
flexandbatchtiers. - Open Responses API reference — full spec for
/v1/responsesincludingservice_tierandbackground. @doubleword/vercel-ai— Vercel AI SDK provider that runs every call onservice_tier=flex.- opencode — TypeScript agentic CLI, native HTTP server, the substrate this bot is built on.