About the Polyglot Concordance
This is a pilot. Current scope is the Gospel of Mark only. The full plan — additional NT books, the Hebrew Bible flagship, further witnesses, export formats, and two planned sub-projects — is laid out in the
roadmap below.
A word-by-word alignment of the Gospel of Mark across three textual witnesses — Greek NT, Syriac Peshitta, and Latin Clementine Vulgate — with a one- or two-sentence apparatus note on every divergence.
Methodology
Every verse in Mark has been aligned word-by-word across the three witnesses by Anthropic's Claude Sonnet 4.5 via the Batch API. Each alignment group records a variant verdict — aligned, minor, major, omitted, or added — plus a semantic type (agreement, construction, harmonisation, substitution, idiom, etc.) and, for every non-aligned group, one to two sentences suitable for a critical apparatus.
Results are pre-computed and stored as per-verse JSON files in the repository — the running viewer has no runtime LLM dependency.
Methodology validation (and its limits)
We sanity-checked Claude's alignment reasoning against the Berean Interlinear Bible on the narrow sub-task of Greek→English word alignment. Across all 673 Mark verses (6175 scoring tokens), Claude agreed with Berean's scholarly glosses on 67.7% of tokens (4182 / 6175). Recorded 2026-04-23T06:26:59.983024+00:00 using claude-sonnet-4-5.
The remaining ~32% of non-matches are mostly translation-word-choice differences between Berean and WEB (the two English translations involved) — e.g. Berean “crowd” vs. WEB “people” for the same Greek ὄχλος — rather than alignment errors.
What this means: it's a transferable sanity check, not a correctness guarantee. Claude's Greek→English alignment logic is consistent with an established scholarly interlinear, which rules out catastrophic failure modes and suggests the 3-way Greek / Peshitta / Vulgate alignments in this viewer use the same reasonable reasoning. It does not directly measure the quality of the Peshitta or Vulgate alignments (no ground-truth reference exists), nor does it measure apparatus-note quality, variant classification, or type-tag correctness.
Berean is used only as a methodology benchmark — never as display data in the viewer.
Viewer inspiration
The parallel-columns concept and the green / red variant color-coding (minor vs. major) are inspired by the bible-mt5 parallel viewer built by Dr. Zhan Chen (Associate Professor, Digital Social Science & Associate Distinguished Research Fellow at the Research Centre for History and Culture, United International College — BNU-HKBU UIC, Zhuhai). Dr. Chen's own scholarship focuses on Syriac biblical texts (dissertation: An Investigation into the Peshitta of Isaiah, Harvard NELC, 2020) and Chinese Bible translations.
Data sources
- Greek NT (tagged)
- STEP Bible TAGNT from Tyndale House Cambridge, CC BY 4.0, via github.com/STEPBible/STEPBible-Data
- Peshitta NT
- Via the Aramaic Root Atlas corpus (Jossi Fresco Benaim) — Syriac text, CSV-packaged.
- Clementine Vulgate
- Public domain, via seven1m/open-bibles (USFX)
- English verse gloss
- World English Bible (WEB), public domain
- Peshitta roots, cognates, and sister-roots
- Aramaic Root Atlas (Jossi Fresco Benaim) — triliteral-root extraction, sister-root detection, and Hebrew / Arabic cognate mapping. Used by this viewer's click-for-tooltip layer on Peshitta tokens.
- Alignment validation benchmark
- Berean Interlinear Bible — methodology check only (67.7% agreement on a 673-verse sample).
- Alignment generation
- Anthropic Claude Sonnet 4.5
Related works
- Peshitta Constellations
- peshitta.onrender.com — companion project exploring the Peshitta corpus.
- Aramaic Root Atlas
- aramaic-root-atlas.onrender.com — cross-corpus Semitic triliteral root explorer with Hebrew and Arabic cognate mapping. Provides the Peshitta root data consumed by this viewer's tooltip layer.
- BibCrit
- bibcrit.app — biblical criticism workspace for textual analysis.
What this app is good for
The underlying engine is a multi-witness parallel-text viewer with a critical apparatus. Anywhere a text exists in more than one version, this tool can show it side-by-side with per-word alignment and annotated divergences.
Biblical scholarship
- Text-critical study — see where a Greek NT verse, its Peshitta translation, and its Vulgate rendering disagree, with each divergence typed (harmonisation, substitution, idiom, grammar change, etc.) and explained in one or two sentences.
- Synoptic comparison — once Matthew / Mark / Luke all ship, compare the same pericope across the three Synoptic gospels in three languages each.
- Semitic-language parallel reading — Peshitta next to Hebrew OT next to Targum Onkelos; root / sister-root tooltips surface cognate structure visibly.
- Translation-technique study — how does the Peshitta handle Greek participles? How does the Vulgate handle Hebrew infinitive absolutes? The typed apparatus makes this kind of question quickly answerable.
- Harmonisation tracing — when one tradition imports material from a parallel passage (e.g. Peshitta Mark 13:14 pulling "by Daniel the prophet" from Matt 24:15), the apparatus flags it explicitly.
Teaching
- Ancient-language pedagogy — introductory Greek / Hebrew / Syriac / Latin students can hover any word for Strong's / lemma / morphology. The variant apparatus turns every verse into a textual-criticism lesson.
- Comparative religion — a neutral surface for showing how different canonical lineages transmit the same base text.
- Critical-edition training — readers learn to read a scholarly apparatus by clicking through real examples rather than deciphering the dense sigla of a printed NA28 / BHS.
Translation work
- Checking a new translation against ancient witnesses — does a contemporary Spanish or Chinese rendering match the majority of ancient traditions or diverge meaningfully?
- Informing committee decisions — Bible translation teams can point to a specific divergence pattern (e.g. "MT + LXX + Peshitta all read X; only Vulgate reads Y") in one clickable reference.
Reader-facing discovery
- Scripture reading with a critical dimension — readers can see that a familiar phrase has a contested history without opening a textbook.
- Liturgical comparison — Eastern Orthodox liturgies use Slavonic + Greek Byzantine; Catholic uses Vulgate; churches of the East use Peshitta. All three surface-able side by side.
Roadmap
Up next
- Syriac font selection — user-picked typeface (Noto Sans Syriac, Estrangelo Edessa, Serto / Jacobite, East Syriac / Nestorian, Madnhaya) with a glyph preview and per-user persistence.
- Settings panel in the top bar — consolidates language, Syriac font, theme, and rail visibility behind a single cog icon; preferences persist across sessions.
Expand book coverage
New Testament (continuing the current scope):
- All four gospels — Matthew / Luke / John. Corpus + alignment pipeline already supports this.
- Acts — Greek NT historical narrative; Vulgate and Peshitta already cover it.
- Pauline epistles — Romans through Philemon.
- General epistles and Revelation — James, 1–2 Peter, 1–3 John, Jude, Revelation.
Hebrew Bible / Tanakh (a separate flagship):
- Torah (Pentateuch) — the natural starting point. Genesis, Exodus, Leviticus, Numbers, Deuteronomy aligned across Masoretic (Leningrad / Aleppo) + LXX (Rahlfs or Swete) + Peshitta OT + Vulgate OT + Samaritan Pentateuch (~6,000 meaningful variants from the MT) + Targum Onkelos + Targum Pseudo-Jonathan.
- Former Prophets — Joshua, Judges, Samuel, Kings.
- Latter Prophets — Isaiah (Dr. Chen's own dissertation focus, rich Peshitta scholarship available), Jeremiah, Ezekiel, the Twelve Minor Prophets.
- Dead Sea Scrolls witnesses where extant (1QIsaᵃ, 4QSamᵇ, etc.) — variant readings often align interestingly against the MT / LXX / Peshitta.
- Writings (Ketuvim) — Psalms, Proverbs, Job, Megillot (Song of Songs, Ruth, Lamentations, Ecclesiastes, Esther), Daniel, Ezra-Nehemiah, Chronicles.
Deuterocanonical / Apocrypha:
- Wisdom of Solomon, Sirach, Tobit, Judith, Maccabees, Baruch, etc. — relevant to Catholic, Orthodox, and Ethiopian canons.
Expand witness coverage (within current books)
- English (WEB) as a first-class 4th column, with alignments regenerated across four traditions.
- Greek Byzantine (Majority Text / TR) — diffs against NA28 surface as a fifth witness column, useful for Eastern Orthodox and KJV-lineage readers.
- Orthodox Chinese cluster — Slavonic 1751 (Elizabeth Bible) + 1864 Küri (固里, Archimandrite Gury Karpov) + 1910 Innokenti (英诺肯提乙, Bishop Innokenti Figurovsky). These three form an alignment family of Russian-Orthodox-mission Classical Chinese translations against the Church Slavonic + Greek liturgical text.
- Coptic (Sahidic + Bohairic).
- Armenian (Zohrab Bible).
- Ethiopic (Ge'ez).
- Latin Vetus / Old Latin — pre-Jerome Latin witnesses for synoptic variant research.
- Targum Onkelos + Biblical Aramaic + Peshitta OT — already in the Aramaic Root Atlas corpus; wire up when Hebrew OT scope lands.
Viewer features
- Accessibility audit — full aria pass, keyboard-only navigation, high-contrast / dark mode polish.
- Cross-references in variant notes (e.g. "cf. Matt 24:15") rendered as clickable jumps between parallel verses.
- Per-chapter navigation panel with verse counts, low-confidence flags, and manual-review markers.
- Bookmarks / permalinks — deep-link to a specific variant stays stable; shareable "copy link" affordance.
- Mobile polish — one-column responsive layout, larger tap targets, swipe between verses.
- Peshitta root-card on hover — hovering any Peshitta token (no click required) shows a compact card with the triliteral root, gloss, sister roots, and Hebrew / Arabic cognates pulled from the Aramaic Root Atlas. Click still opens the full Strong's-style tooltip.
Export
- TEI XML export of the critical apparatus per TEI P5
<app>/<rdg> conventions.
- BibTeX citation export per verse / per variant.
- CSV bulk export of all alignments for downstream analysis.
- Public read-only JSON API at
/api/v1/ — alignment, verse, search, and manifest endpoints with OpenAPI docs at /api/docs. CC BY 4.0 (derived data); CORS-open.
Sub-projects
- Machine Annotation Engine — live, on-demand alignment for verses the pilot doesn't cover. REST endpoint
/api/align with per-request confidence, caching, rate-limiting, custom-text override.
- Scholar Review Workflow — accounts + auth (ORCID / GitHub / Google), student / scholar / editor roles, review queue pulling from the engine and flagged low-confidence verses, threaded per-verse comments, consensus-threshold publication, TEI / BibTeX / CSV gold-standard export, dataset citation via Zenodo DOI.
Versification
Verse labels follow NA28 numbering. The Clementine Vulgate uses a different verse split in some chapters; where necessary (notably Mark 9 and Mark 4:40–41) we've remapped the Vulgate text to the NA28 boundaries. See the repository's known-issues.md for the complete list of edits.
License & reuse
The viewer code is intended to be released under an open-source license once the project is publicly funded. The derived alignment JSON is produced from sources with mixed licenses (CC BY 4.0, public domain); any future redistribution will credit the upstream sources.
Contact
Feedback welcome —
jossi@somosunodigital.com.