About the Polyglot Concordance
This is a pilot. Current scope is the Gospel of Mark only. The full plan — additional NT books, the Hebrew Bible flagship, further witnesses, export formats, and two planned sub-projects — is laid out in the
roadmap below.
A word-by-word alignment of the Gospel of Mark across three textual witnesses — Greek NT, Syriac Peshitta, and Latin Clementine Vulgate — with a one- or two-sentence apparatus note on every divergence.
Methodology
Every verse in Mark has been aligned word-by-word across the three witnesses by Anthropic's Claude Opus 4.8 via the Batch API. Each alignment group records a variant verdict — aligned, minor, major, omitted, or added — plus a semantic type (agreement, construction, harmonisation, substitution, idiom, etc.) and, for every non-aligned group, one to two sentences of apparatus-style annotation. The output is a machine-generated alignment draft, intended as a starting point for scholar review rather than as an authoritative critical edition.
Results are pre-computed and stored as per-verse JSON files in the repository — the running viewer has no runtime LLM dependency.
Methodology validation (and its limits)
We sanity-checked Claude's alignment reasoning against the Berean Interlinear Bible on the narrow sub-task of Greek→English word alignment. Across all 673 Mark verses (6175 scoring tokens), Claude agreed with Berean's scholarly glosses on 67.7% of tokens (4182 / 6175). Recorded 2026-04-23T06:26:59.983024+00:00 using claude-sonnet-4-5.
The remaining ~32% of non-matches are mostly translation-word-choice differences between Berean and WEB (the two English translations involved) — e.g. Berean “crowd” vs. WEB “people” for the same Greek ὄχλος — rather than alignment errors.
What this means: it's a transferable sanity check, not a correctness guarantee. Claude's Greek→English alignment logic is consistent with an established scholarly interlinear, which rules out catastrophic failure modes and suggests the 3-way Greek / Peshitta / Vulgate alignments in this viewer use the same reasonable reasoning. It does not directly measure the quality of the Peshitta or Vulgate alignments (no ground-truth reference exists), nor does it measure apparatus-note quality, variant classification, or type-tag correctness.
Berean is used only as a methodology benchmark — never as display data in the viewer.
Run-to-run stability is measured, and imperfect. The published corpus (v2.0.0, Claude Opus 4.8) is one sample from a generation process that still varies between runs: two independent Opus 4.8 runs of Mark 13 agree on only ~76% of alignment-group memberships, with 90% verdict agreement and 76% semantic-type agreement on shared groups — so roughly a quarter of all groups are unstable across runs. The model's self-reported confidence (~0.86) is not calibrated against correctness and should not be read as a quality signal. The upgrade from Sonnet 4.5 (corpus v1.0.0) to Opus 4.8 (v2.0.0) measurably improved both accuracy — 4 of 4 hand-verified apparatus errors avoided, including one Sonnet repeated systematically — and consistency (+11 pp group-membership agreement), but eliminated neither. This corpus is best treated as a machine-generated alignment draft, not as an authoritative critical edition.
Viewer inspiration
The parallel-columns concept and the green / red variant color-coding (minor vs. major) are inspired by the bible-mt5 parallel viewer built by Dr. Zhan Chen (Associate Professor, Digital Social Science & Associate Distinguished Research Fellow at the Research Centre for History and Culture, United International College — BNU-HKBU UIC, Zhuhai). Dr. Chen's own scholarship focuses on Syriac biblical texts (dissertation: An Investigation into the Peshitta of Isaiah, Harvard NELC, 2020) and Chinese Bible translations.
Data sources
- Greek NT (tagged)
- STEP Bible TAGNT from Tyndale House Cambridge, CC BY 4.0, via github.com/STEPBible/STEPBible-Data
- Peshitta NT
- Via the Aramaic Root Atlas corpus (Jossi Fresco Benaim) — Syriac text, CSV-packaged.
- Clementine Vulgate
- Public domain, via seven1m/open-bibles (USFX)
- English verse gloss
- World English Bible (WEB), public domain
- Peshitta roots, cognates, and sister-roots
- Aramaic Root Atlas (Jossi Fresco Benaim) — triliteral-root extraction, sister-root detection, and Hebrew / Arabic cognate mapping. Used by this viewer's click-for-tooltip layer on Peshitta tokens.
- Alignment validation benchmark
- Berean Interlinear Bible — methodology check only (67.7% agreement on a 673-verse sample).
- Alignment generation
- Anthropic Claude Opus 4.8
API
The same artifacts that power the rendered viewer are exposed as a public, read-only JSON API under /api/v1/. CORS-open, edge-cached, OpenAPI-documented — any tool can fetch the alignment data, the search index, or the corpus manifest without scraping HTML or cloning the repository.
/api/docs
— Interactive Swagger UI rendered from the OpenAPI 3.0 spec.
/api/v1/manifest
— Corpus metadata: witnesses, gloss editions, schema version, full verse enumeration, and the alignment-generation provenance with the Berean benchmark.
/api/v1/alignment/{book}/{chapter}/{verse}
— Canonical alignment JSON for one verse — alignment groups with verdict, semantic type, apparatus prose, and confidence score.
/api/v1/verse/{book}/{chapter}/{verse}
— Display-shape verse dict — witnesses with align-tagged tokens, variants, and the multi-language gloss_map.
/api/v1/search?q={query}
— Verse-level full-text search across all witnesses and variant types. Reference jumps (e.g. 13:14) are supported.
All endpoints are CORS-open and edge-cached. License: CC BY 4.0 for the derived alignment data; redistribute with attribution to upstream sources.
Related works
- Peshitta Constellations
- peshitta.onrender.com — companion project exploring the Peshitta corpus.
- Aramaic Root Atlas
- aramaic-root-atlas.onrender.com — cross-corpus Semitic triliteral root explorer with Hebrew and Arabic cognate mapping. Provides the Peshitta root data consumed by this viewer's tooltip layer.
- BibCrit
- bibcrit.app — biblical criticism workspace for textual analysis.
What this app is good for
The underlying engine is a multi-witness parallel-text viewer with AI-generated alignment and apparatus annotations. Anywhere a text exists in more than one version, this tool can show it side-by-side with per-word alignment and annotated divergences.
Biblical scholarship
- Text-critical study — see where a Greek NT verse, its Peshitta translation, and its Vulgate rendering disagree, with each divergence typed (harmonisation, substitution, idiom, grammar change, etc.) and explained in one or two sentences.
- Synoptic comparison — once Matthew / Mark / Luke all ship, compare the same pericope across the three Synoptic gospels in three languages each.
- Semitic-language parallel reading — Peshitta next to Hebrew OT next to Targum Onkelos; root / sister-root tooltips surface cognate structure visibly.
- Translation-technique study — how does the Peshitta handle Greek participles? How does the Vulgate handle Hebrew infinitive absolutes? The typed apparatus makes this kind of question quickly answerable.
- Harmonisation tracing — when one tradition imports material from a parallel passage (e.g. Peshitta Mark 13:14 pulling "by Daniel the prophet" from Matt 24:15), the apparatus flags it explicitly.
Teaching
- Ancient-language pedagogy — introductory Greek / Hebrew / Syriac / Latin students can hover any word for Strong's / lemma / morphology. The variant apparatus turns every verse into a textual-criticism lesson.
- Comparative religion — a neutral surface for showing how different canonical lineages transmit the same base text.
- Critical-edition training — readers learn the *shape* of a scholarly apparatus (sigla, verdicts, types) by clicking through worked examples, before tackling the dense conventions of a printed NA28 / BHS. The apparatus shown here is machine-generated and pedagogical, not a substitute for an authoritative edition.
Translation work
- Checking a new translation against ancient witnesses — does a contemporary Spanish or Chinese rendering match the majority of ancient traditions or diverge meaningfully?
- Informing committee decisions — Bible translation teams can point to a specific divergence pattern (e.g. "MT + LXX + Peshitta all read X; only Vulgate reads Y") in one clickable reference.
Reader-facing discovery
- Scripture reading with a critical dimension — readers can see that a familiar phrase has a contested history without opening a textbook.
- Liturgical comparison — Eastern Orthodox liturgies use Slavonic + Greek Byzantine; Catholic uses Vulgate; churches of the East use Peshitta. All three surface-able side by side.
Roadmap
Up next
- Syriac font selection — user-picked typeface (Noto Sans Syriac, Estrangelo Edessa, Serto / Jacobite, East Syriac / Nestorian, Madnhaya) with a glyph preview and per-user persistence.
- Settings panel in the top bar — consolidates language, Syriac font, theme, and rail visibility behind a single cog icon; preferences persist across sessions.
Expand book coverage
New Testament (continuing the current scope):
- All four gospels — Matthew / Luke / John. Corpus + alignment pipeline already supports this.
- Acts — Greek NT historical narrative; Vulgate and Peshitta already cover it.
- Pauline epistles — Romans through Philemon.
- General epistles and Revelation — James, 1–2 Peter, 1–3 John, Jude, Revelation.
Hebrew Bible / Tanakh (a separate flagship):
- Torah (Pentateuch) — the natural starting point. Genesis, Exodus, Leviticus, Numbers, Deuteronomy aligned across Masoretic (Leningrad / Aleppo) + LXX (Rahlfs or Swete) + Peshitta OT + Vulgate OT + Samaritan Pentateuch (~6,000 meaningful variants from the MT) + Targum Onkelos + Targum Pseudo-Jonathan.
- Former Prophets — Joshua, Judges, Samuel, Kings.
- Latter Prophets — Isaiah (Dr. Chen's own dissertation focus, rich Peshitta scholarship available), Jeremiah, Ezekiel, the Twelve Minor Prophets.
- Dead Sea Scrolls witnesses where extant (1QIsaᵃ, 4QSamᵇ, etc.) — variant readings often align interestingly against the MT / LXX / Peshitta.
- Writings (Ketuvim) — Psalms, Proverbs, Job, Megillot (Song of Songs, Ruth, Lamentations, Ecclesiastes, Esther), Daniel, Ezra-Nehemiah, Chronicles.
Deuterocanonical / Apocrypha:
- Wisdom of Solomon, Sirach, Tobit, Judith, Maccabees, Baruch, etc. — relevant to Catholic, Orthodox, and Ethiopian canons.
Expand witness coverage (within current books)
- English (WEB) as a first-class 4th column, with alignments regenerated across four traditions.
- Greek Byzantine (Majority Text / TR) — diffs against NA28 surface as a fifth witness column, useful for Eastern Orthodox and KJV-lineage readers.
- Orthodox Chinese cluster — Slavonic 1751 (Elizabeth Bible) + 1864 Küri (固里, Archimandrite Gury Karpov) + 1910 Innokenti (英诺肯提乙, Bishop Innokenti Figurovsky). These three form an alignment family of Russian-Orthodox-mission Classical Chinese translations against the Church Slavonic + Greek liturgical text.
- Coptic (Sahidic + Bohairic).
- Armenian (Zohrab Bible).
- Ethiopic (Ge'ez).
- Latin Vetus / Old Latin — pre-Jerome Latin witnesses for synoptic variant research.
- Targum Onkelos + Biblical Aramaic + Peshitta OT — already in the Aramaic Root Atlas corpus; wire up when Hebrew OT scope lands.
Viewer features
- Accessibility audit — full aria pass, keyboard-only navigation, high-contrast / dark mode polish.
- Cross-references in variant notes (e.g. "cf. Matt 24:15") rendered as clickable jumps between parallel verses.
- Per-chapter navigation panel with verse counts, low-confidence flags, and manual-review markers.
- Bookmarks / permalinks — deep-link to a specific variant stays stable; shareable "copy link" affordance.
- Mobile polish — one-column responsive layout, larger tap targets, swipe between verses.
- Peshitta root-card on hover — hovering any Peshitta token (no click required) shows a compact card with the triliteral root, gloss, sister roots, and Hebrew / Arabic cognates pulled from the Aramaic Root Atlas. Click still opens the full Strong's-style tooltip.
Export
- TEI XML export of the critical apparatus per TEI P5
<app>/<rdg> conventions.
- BibTeX citation export per verse / per variant.
- CSV bulk export of all alignments for downstream analysis.
Sub-projects
- Machine Annotation Engine — live, on-demand alignment for verses the pilot doesn't cover. REST endpoint
/api/align with per-request confidence, caching, rate-limiting, custom-text override.
- Scholar Review Workflow — accounts + auth (ORCID / GitHub / Google), student / scholar / editor roles, review queue pulling from the engine and flagged low-confidence verses, threaded per-verse comments, consensus-threshold publication, TEI / BibTeX / CSV gold-standard export, dataset citation via Zenodo DOI.
Versification
Verse labels follow NA28 numbering. The Clementine Vulgate uses a different verse split in some chapters; where necessary (notably Mark 9 and Mark 4:40–41) we've remapped the Vulgate text to the NA28 boundaries. See the repository's known-issues.md for the complete list of edits.
License & reuse
The viewer code is intended to be released under an open-source license once the project is publicly funded. The derived alignment JSON is produced from sources with mixed licenses (CC BY 4.0, public domain); any future redistribution will credit the upstream sources.
Contact
Feedback welcome —
jossi@somosunodigital.com.