About the Polyglot Concordance

This is a pilot. Current scope is the Gospel of Mark only. The full plan — additional NT books, the Hebrew Bible flagship, further witnesses, export formats, and two planned sub-projects — is laid out in the roadmap below.

A word-by-word alignment of the Gospel of Mark across three textual witnesses — Greek NT, Syriac Peshitta, and Latin Clementine Vulgate — with a one- or two-sentence apparatus note on every divergence.

Author Jossi Fresco Benaim 0009-0000-2026-0836

Methodology

Every verse in Mark has been aligned word-by-word across the three witnesses by Anthropic's Claude Opus 4.8 via the Batch API. Each alignment group records a variant verdict — aligned, minor, major, omitted, or added — plus a semantic type (agreement, construction, harmonisation, substitution, idiom, etc.) and, for every non-aligned group, one to two sentences of apparatus-style annotation. The output is a machine-generated alignment draft, intended as a starting point for scholar review rather than as an authoritative critical edition.

Results are pre-computed and stored as per-verse JSON files in the repository — the running viewer has no runtime LLM dependency.

Methodology validation (and its limits)

We sanity-checked Claude's alignment reasoning against the Berean Interlinear Bible on the narrow sub-task of Greek→English word alignment. Across all 673 Mark verses (6175 scoring tokens), Claude agreed with Berean's scholarly glosses on 67.7% of tokens (4182 / 6175). Recorded 2026-04-23T06:26:59.983024+00:00 using claude-sonnet-4-5.

The remaining ~32% of non-matches are mostly translation-word-choice differences between Berean and WEB (the two English translations involved) — e.g. Berean “crowd” vs. WEB “people” for the same Greek ὄχλος — rather than alignment errors.

What this means: it's a transferable sanity check, not a correctness guarantee. Claude's Greek→English alignment logic is consistent with an established scholarly interlinear, which rules out catastrophic failure modes and suggests the 3-way Greek / Peshitta / Vulgate alignments in this viewer use the same reasonable reasoning. It does not directly measure the quality of the Peshitta or Vulgate alignments (no ground-truth reference exists), nor does it measure apparatus-note quality, variant classification, or type-tag correctness.

Berean is used only as a methodology benchmark — never as display data in the viewer.

Run-to-run stability is measured, and imperfect. The published corpus (v2.0.0, Claude Opus 4.8) is one sample from a generation process that still varies between runs: two independent Opus 4.8 runs of Mark 13 agree on only ~76% of alignment-group memberships, with 90% verdict agreement and 76% semantic-type agreement on shared groups — so roughly a quarter of all groups are unstable across runs. The model's self-reported confidence (~0.86) is not calibrated against correctness and should not be read as a quality signal. The upgrade from Sonnet 4.5 (corpus v1.0.0) to Opus 4.8 (v2.0.0) measurably improved both accuracy — 4 of 4 hand-verified apparatus errors avoided, including one Sonnet repeated systematically — and consistency (+11 pp group-membership agreement), but eliminated neither. This corpus is best treated as a machine-generated alignment draft, not as an authoritative critical edition.

Viewer inspiration

The parallel-columns concept and the green / red variant color-coding (minor vs. major) are inspired by the bible-mt5 parallel viewer built by Dr. Zhan Chen (Associate Professor, Digital Social Science & Associate Distinguished Research Fellow at the Research Centre for History and Culture, United International College — BNU-HKBU UIC, Zhuhai). Dr. Chen's own scholarship focuses on Syriac biblical texts (dissertation: An Investigation into the Peshitta of Isaiah, Harvard NELC, 2020) and Chinese Bible translations.

Data sources

Greek NT (tagged): STEP Bible TAGNT from Tyndale House Cambridge, CC BY 4.0, via github.com/STEPBible/STEPBible-Data
Peshitta NT: Via the Aramaic Root Atlas corpus (Jossi Fresco Benaim) — Syriac text, CSV-packaged.
Clementine Vulgate: Public domain, via seven1m/open-bibles (USFX)
English verse gloss: World English Bible (WEB), public domain
Peshitta roots, cognates, and sister-roots: Aramaic Root Atlas (Jossi Fresco Benaim) — triliteral-root extraction, sister-root detection, and Hebrew / Arabic cognate mapping. Used by this viewer's click-for-tooltip layer on Peshitta tokens.
Alignment validation benchmark: Berean Interlinear Bible — methodology check only (67.7% agreement on a 673-verse sample).
Alignment generation: Anthropic Claude Opus 4.8

API

The same artifacts that power the rendered viewer are exposed as a public, read-only JSON API under /api/v1/. CORS-open, edge-cached, OpenAPI-documented — any tool can fetch the alignment data, the search index, or the corpus manifest without scraping HTML or cloning the repository.

/api/docs — Interactive Swagger UI rendered from the OpenAPI 3.0 spec.
/api/v1/manifest — Corpus metadata: witnesses, gloss editions, schema version, full verse enumeration, and the alignment-generation provenance with the Berean benchmark.
/api/v1/alignment/{book}/{chapter}/{verse} — Canonical alignment JSON for one verse — alignment groups with verdict, semantic type, apparatus prose, and confidence score.
/api/v1/verse/{book}/{chapter}/{verse} — Display-shape verse dict — witnesses with align-tagged tokens, variants, and the multi-language gloss_map.
/api/v1/search?q={query} — Verse-level full-text search across all witnesses and variant types. Reference jumps (e.g. 13:14) are supported.

All endpoints are CORS-open and edge-cached. License: CC BY 4.0 for the derived alignment data; redistribute with attribution to upstream sources.

Related works

Peshitta Constellations: peshitta.onrender.com — companion project exploring the Peshitta corpus.
Aramaic Root Atlas: aramaic-root-atlas.onrender.com — cross-corpus Semitic triliteral root explorer with Hebrew and Arabic cognate mapping. Provides the Peshitta root data consumed by this viewer's tooltip layer.
BibCrit: bibcrit.app — biblical criticism workspace for textual analysis.

What this app is good for

The underlying engine is a multi-witness parallel-text viewer with AI-generated alignment and apparatus annotations. Anywhere a text exists in more than one version, this tool can show it side-by-side with per-word alignment and annotated divergences.

Biblical scholarship

Text-critical study — see where a Greek NT verse, its Peshitta translation, and its Vulgate rendering disagree, with each divergence typed (harmonisation, substitution, idiom, grammar change, etc.) and explained in one or two sentences.
Synoptic comparison — once Matthew / Mark / Luke all ship, compare the same pericope across the three Synoptic gospels in three languages each.
Semitic-language parallel reading — Peshitta next to Hebrew OT next to Targum Onkelos; root / sister-root tooltips surface cognate structure visibly.
Translation-technique study — how does the Peshitta handle Greek participles? How does the Vulgate handle Hebrew infinitive absolutes? The typed apparatus makes this kind of question quickly answerable.
Harmonisation tracing — when one tradition imports material from a parallel passage (e.g. Peshitta Mark 13:14 pulling "by Daniel the prophet" from Matt 24:15), the apparatus flags it explicitly.

Teaching

Ancient-language pedagogy — introductory Greek / Hebrew / Syriac / Latin students can hover any word for Strong's / lemma / morphology. The variant apparatus turns every verse into a textual-criticism lesson.
Comparative religion — a neutral surface for showing how different canonical lineages transmit the same base text.
Critical-edition training — readers learn the *shape* of a scholarly apparatus (sigla, verdicts, types) by clicking through worked examples, before tackling the dense conventions of a printed NA28 / BHS. The apparatus shown here is machine-generated and pedagogical, not a substitute for an authoritative edition.

Translation work

Checking a new translation against ancient witnesses — does a contemporary Spanish or Chinese rendering match the majority of ancient traditions or diverge meaningfully?
Informing committee decisions — Bible translation teams can point to a specific divergence pattern (e.g. "MT + LXX + Peshitta all read X; only Vulgate reads Y") in one clickable reference.

Reader-facing discovery

Scripture reading with a critical dimension — readers can see that a familiar phrase has a contested history without opening a textbook.
Liturgical comparison — Eastern Orthodox liturgies use Slavonic + Greek Byzantine; Catholic uses Vulgate; churches of the East use Peshitta. All three surface-able side by side.

Roadmap

Up next

Syriac font selection — user-picked typeface (Noto Sans Syriac, Estrangelo Edessa, Serto / Jacobite, East Syriac / Nestorian, Madnhaya) with a glyph preview and per-user persistence.
Settings panel in the top bar — consolidates language, Syriac font, theme, and rail visibility behind a single cog icon; preferences persist across sessions.

Expand book coverage

New Testament (continuing the current scope):

All four gospels — Matthew / Luke / John. Corpus + alignment pipeline already supports this.
Acts — Greek NT historical narrative; Vulgate and Peshitta already cover it.
Pauline epistles — Romans through Philemon.
General epistles and Revelation — James, 1–2 Peter, 1–3 John, Jude, Revelation.

Hebrew Bible / Tanakh (a separate flagship):

Torah (Pentateuch) — the natural starting point. Genesis, Exodus, Leviticus, Numbers, Deuteronomy aligned across Masoretic (Leningrad / Aleppo) + LXX (Rahlfs or Swete) + Peshitta OT + Vulgate OT + Samaritan Pentateuch (~6,000 meaningful variants from the MT) + Targum Onkelos + Targum Pseudo-Jonathan.
Former Prophets — Joshua, Judges, Samuel, Kings.
Latter Prophets — Isaiah (Dr. Chen's own dissertation focus, rich Peshitta scholarship available), Jeremiah, Ezekiel, the Twelve Minor Prophets.
Dead Sea Scrolls witnesses where extant (1QIsaᵃ, 4QSamᵇ, etc.) — variant readings often align interestingly against the MT / LXX / Peshitta.
Writings (Ketuvim) — Psalms, Proverbs, Job, Megillot (Song of Songs, Ruth, Lamentations, Ecclesiastes, Esther), Daniel, Ezra-Nehemiah, Chronicles.

Deuterocanonical / Apocrypha:

Wisdom of Solomon, Sirach, Tobit, Judith, Maccabees, Baruch, etc. — relevant to Catholic, Orthodox, and Ethiopian canons.

Expand witness coverage (within current books)

English (WEB) as a first-class 4th column, with alignments regenerated across four traditions.
Greek Byzantine (Majority Text / TR) — diffs against NA28 surface as a fifth witness column, useful for Eastern Orthodox and KJV-lineage readers.
Orthodox Chinese cluster — Slavonic 1751 (Elizabeth Bible) + 1864 Küri (固里, Archimandrite Gury Karpov) + 1910 Innokenti (英诺肯提乙, Bishop Innokenti Figurovsky). These three form an alignment family of Russian-Orthodox-mission Classical Chinese translations against the Church Slavonic + Greek liturgical text.
Coptic (Sahidic + Bohairic).
Armenian (Zohrab Bible).
Ethiopic (Ge'ez).
Latin Vetus / Old Latin — pre-Jerome Latin witnesses for synoptic variant research.
Targum Onkelos + Biblical Aramaic + Peshitta OT — already in the Aramaic Root Atlas corpus; wire up when Hebrew OT scope lands.

Viewer features

Accessibility audit — full aria pass, keyboard-only navigation, high-contrast / dark mode polish.
Cross-references in variant notes (e.g. "cf. Matt 24:15") rendered as clickable jumps between parallel verses.
Per-chapter navigation panel with verse counts, low-confidence flags, and manual-review markers.
Bookmarks / permalinks — deep-link to a specific variant stays stable; shareable "copy link" affordance.
Mobile polish — one-column responsive layout, larger tap targets, swipe between verses.
Peshitta root-card on hover — hovering any Peshitta token (no click required) shows a compact card with the triliteral root, gloss, sister roots, and Hebrew / Arabic cognates pulled from the Aramaic Root Atlas. Click still opens the full Strong's-style tooltip.

Export

TEI XML export of the critical apparatus per TEI P5 <app>/<rdg> conventions.
BibTeX citation export per verse / per variant.
CSV bulk export of all alignments for downstream analysis.

Sub-projects

Machine Annotation Engine — live, on-demand alignment for verses the pilot doesn't cover. REST endpoint /api/align with per-request confidence, caching, rate-limiting, custom-text override.
Scholar Review Workflow — accounts + auth (ORCID / GitHub / Google), student / scholar / editor roles, review queue pulling from the engine and flagged low-confidence verses, threaded per-verse comments, consensus-threshold publication, TEI / BibTeX / CSV gold-standard export, dataset citation via Zenodo DOI.

Versification

Verse labels follow NA28 numbering. The Clementine Vulgate uses a different verse split in some chapters; where necessary (notably Mark 9 and Mark 4:40–41) we've remapped the Vulgate text to the NA28 boundaries. See the repository's known-issues.md for the complete list of edits.

License & reuse

The viewer code is intended to be released under an open-source license once the project is publicly funded. The derived alignment JSON is produced from sources with mixed licenses (CC BY 4.0, public domain); any future redistribution will credit the upstream sources.

Contact

Feedback welcome — jossi@somosunodigital.com.