connecting…
|
klappy / ptxprint-mcp
Cloudflare-native MCP server  ·  v1.3-draft

Typeset scripture
from a prompt.

Fifty years of Paratext and XeTeX craft compressed into three async tools an AI agent can call. Submit a typesetting job. Poll for status. Cancel if it overruns. Get a publication-quality PDF back.

POST https://ptxprint.klappy.dev/mcp GET https://ptxprint.klappy.dev/health
§ I.   The pitch

A thin, opinionless layer over a deeply opinionated craft.

PTXprint is the tool Bible translation teams use to typeset Paratext projects into print-ready PDFs — & it is glorious. Hundreds of settings. Real diglot, polyglot, study-Bible layouts. Real XeTeX under the hood. The MCP server you are looking at does not pretend to know any of that. It exposes filesystem-shaped IO, content-addressed job submission,  and gets out of the way.

The opinions live next door, in a canon repository served by oddkit. But the agent doesn't talk to two MCPs — it talks to one. The docs(query) tool on this server proxies canon retrieval upstream, so the agent's loop is ask docs · understand · act · observe across a single MCP connection. One server, one concern — the design rationale is in §VI.

i.

For translation teams

Hand a translation agent your Paratext project — in any language, any script — and get a publication-quality PDF back. The agent knows when to ask, what to tweak, and when the result is ready to send to the press.

ii.

For agent builders

Three async tools. No domain quiz to pass. Submit a job, poll for status, cancel if it overruns. The server takes care of XeTeX, autofill passes, content-addressed caching, and surfacing failures in language a model can reason about.

iii.

For systems people

Cloudflare Worker dispatches via service binding into a Container running PTXprint & XeTeX. Durable Objects hold per-job state. R2 stores content-addressed outputs. SHA-256 of the canonical payload is the cache key — identical jobs cost zero CPU.

§ II.   Live demo · real MCP calls

Submit a job. Get a real PDF.

Both demo payloads are checked-in smoke fixtures from the repo's smoke/ directory and have been rendered before, so they cache-hit and return instantly — zero container CPU. The PDF below is the real artifact served from R2.

book
font
BSB · John · Gentium Plus · cache hit — instant PDF
view book fixture on github →
actions
Each call is a real tools/call over MCP streamable-http. Response envelopes are shown verbatim. The page identifies itself with x-ptxprint-client headers so it appears on the transparency leaderboard.
browser ⇌ ptxprint.klappy.dev/mcp idle
artifact
awaiting submit
No artifact yet.
Click submit_typeset to call the real MCP server. A cached PDF will load right here.
§ III.   The canon, live

Ask the docs tool anything.

The MCP server's docs(query) tool searches the project's canon — the prose articles, specs, and governance documents that give an agent enough context to drive PTXprint. Type a question; see the actual answer plus the canon URIs that backed it.

docs(query, audience=headless)
try: · · ·
§ IV.   The contract

Three tools. One contract.

A typesetting job for a whole New Testament can take half an hour. Synchronous tools collide with every chat-shaped surface in existence. So the protocol is async: submit returns immediately, status is pollable, cancellation is honored.

i
tool · async

submit_typeset

Hand it a project, a config, a book selection. Returns a job_id immediately and a predicted output URL. Identical payloads cache-hit.

// returns immediately
{
  job_id: "611700a0…",
  payload_hash: "611700a0…",
  cached: true,
  predicted_pdf_url: "…/r2/…/pdf"
}
ii
tool · pollable

get_job_status

Per-pass progress, log tail, error list, overfull-box count. A human_summary string for downstream chat agents.

{
  state: "succeeded",
  progress: { passes_completed: 1 },
  overfull_count: 8,
  errors: [],
  human_summary: "Done. 61 pages."
}
iii
tool · safety valve

cancel_job

A 30-minute autofill pass needs a kill switch. SIGTERM to the subprocess; partial outputs preserved on disk; state moves to cancelled.

{
  ok: true,
  was_running: false,
  cancelled_at: "2026-04-30T23:24:00Z"
}
cache

SHA-256 of the canonical payload (RFC 8785 JCS) is the only cache key. No TTL. Identical jobs cost zero CPU and return the same R2 object.

timeout discipline

Per-job timeout in the request, default 30 min for autofill, 5 min for simple. No platform-edge timeout exposed to the caller.

progress shape

Per-pass, not per-page. PTXprint doesn't expose useful per-page progress in headless mode — honest "pass 3 of ~5" beats fabricated percentages.

§ V.   Live telemetry · this server

No information asymmetry.

Every tool call against ptxprint.klappy.dev writes one structural data point to ptxprint_telemetry. Same data the maintainer sees, queried over MCP from this page in your browser, right now. Identify yourself with an x-ptxprint-client header and you'll appear on the consumer leaderboard below.

events · last 30d
activity · last 24h · this server
tool_call leaderboard · last 30d · ptxprint
SUM(_sample_interval) GROUP BY tool_name
loading…
consumer leaderboard · who is calling this server
querying…
companion · oddkit_telemetry — for context, the related canon service
show
loading…
live · telemetry_policy() — what this server tracks and why
show
loading…
audit · the SQL this page just ran (submitted & rewritten)
show

The page submits SQL with semantic field names (event_type, tool_name, consumer_label, …). The worker rewrites them to the positional blobN / doubleN form Cloudflare Analytics Engine actually accepts, and returns the rewritten SQL on each response so this audit can show both sides. The /diagnostics/schema endpoint is the canonical mapping; it's the same data the telemetry_schema MCP tool returns.

submitted — semantic

        
executed at AE — positional (rewriter output)
waiting for query responses…
§ VI.   Architecture

Vodka architecture.

Each MCP server holds opinions about exactly one concern. The PTXprint server holds none about typesetting craft — only about subprocess lifecycle, content-addressed caching, and sandboxed file IO. Domain knowledge lives next door, in canon. Agents see one MCP; PTXprint delegates canon retrieval to oddkit upstream when serving docs().

The agent's reasoning loop is one MCP wide: ask docsunderstandactobserve. Two services in concert — one of them invisible to the agent. Each thin enough to maintain by one person indefinitely.

flow
CALLER Agent CLAUDE / GEMMA / GPT THE ONE MCP THE AGENT SEES ptxprint MCP submit · status · cancel docs · telemetry · policy thin layer · zero domain opinions CONTAINER PTXprint XeTeX · fonts STATE · OUTPUTS DO  ·  R2 SHA-256 cache key UPSTREAM · INTERNAL oddkit MCP canon retrieval (invisible to agent) all calls typeset persist docs() upstream
agent-visible MCP traffic
internal · agent never sees
opinionless server

No piclist syntax. No adjlist semantics. No font tables. No USFM. The server treats every file as opaque text and every subprocess as opaque action.

content-addressed

Cache keys are SHA-256 hashes (RFC 8785 JCS) of the canonical payload. No TTL. No staleness. Two identical jobs share one PDF.

async by design

Cloudflare's 30s Worker timeout collides with 30-minute autofill jobs. The two-step contract is the only honest answer.

canon-governed

Every architectural decision is encoded in OLDC+H artifacts and stored under canon/. The repo is the spec.

§ VII.   Stack

Built on the shoulders of two giants.

edge runtime

Cloudflare

  • Workers  — MCP transport, auth, dispatch via service binding
  • Containers  — PTXprint + XeTeX + SIL Charis (standard-2: 1 vCPU, 6 GiB)
  • Durable Objects  — per-job state, cancellation, polling
  • R2  — content-addressed PDF and log storage
  • Analytics Engine  — public usage telemetry
typesetting

SIL & Paratext

  • PTXprint  — Hosken, Penny, Gardner et al · headless CLI mode
  • XeTeX  — Unicode-native typesetting engine
  • USFM  — scripture markup as the source format
  • SIL Charis  — bundled font for the English-first scope
  • LFF  — Language Font Finder for BCP 47 → font resolution