Polyglot Concordance / Methodology

About the Polyglot Concordance

This is a pilot. Current scope is the Gospel of Mark only. The full plan — additional NT books, the Hebrew Bible flagship, further witnesses, export formats, and two planned sub-projects — is laid out in the roadmap below.

A word-by-word alignment of the Gospel of Mark across three textual witnesses — Greek NT, Syriac Peshitta, and Latin Clementine Vulgate — with a one- or two-sentence apparatus note on every divergence.

Author Jossi Fresco Benaim 0009-0000-2026-0836

Methodology

Every verse in Mark has been aligned word-by-word across the three witnesses by Anthropic's Claude Opus 4.8 via the Batch API. Each alignment group records a variant verdictaligned, minor, major, omitted, or added — plus a semantic type (agreement, construction, harmonisation, substitution, idiom, etc.) and, for every non-aligned group, one to two sentences of apparatus-style annotation. The output is a machine-generated alignment draft, intended as a starting point for scholar review rather than as an authoritative critical edition.

Results are pre-computed and stored as per-verse JSON files in the repository — the running viewer has no runtime LLM dependency.

Methodology validation (and its limits)

We sanity-checked Claude's alignment reasoning against the Berean Interlinear Bible on the narrow sub-task of Greek→English word alignment. Across all 673 Mark verses (6175 scoring tokens), Claude agreed with Berean's scholarly glosses on 67.7% of tokens (4182 / 6175). Recorded 2026-04-23T06:26:59.983024+00:00 using claude-sonnet-4-5.

The remaining ~32% of non-matches are mostly translation-word-choice differences between Berean and WEB (the two English translations involved) — e.g. Berean “crowd” vs. WEB “people” for the same Greek ὄχλος — rather than alignment errors.

What this means: it's a transferable sanity check, not a correctness guarantee. Claude's Greek→English alignment logic is consistent with an established scholarly interlinear, which rules out catastrophic failure modes and suggests the 3-way Greek / Peshitta / Vulgate alignments in this viewer use the same reasonable reasoning. It does not directly measure the quality of the Peshitta or Vulgate alignments (no ground-truth reference exists), nor does it measure apparatus-note quality, variant classification, or type-tag correctness.

Berean is used only as a methodology benchmark — never as display data in the viewer.

Run-to-run stability is measured, and imperfect. The published corpus (v2.0.0, Claude Opus 4.8) is one sample from a generation process that still varies between runs: two independent Opus 4.8 runs of Mark 13 agree on only ~76% of alignment-group memberships, with 90% verdict agreement and 76% semantic-type agreement on shared groups — so roughly a quarter of all groups are unstable across runs. The model's self-reported confidence (~0.86) is not calibrated against correctness and should not be read as a quality signal. The upgrade from Sonnet 4.5 (corpus v1.0.0) to Opus 4.8 (v2.0.0) measurably improved both accuracy — 4 of 4 hand-verified apparatus errors avoided, including one Sonnet repeated systematically — and consistency (+11 pp group-membership agreement), but eliminated neither. This corpus is best treated as a machine-generated alignment draft, not as an authoritative critical edition.

Viewer inspiration

The parallel-columns concept and the green / red variant color-coding (minor vs. major) are inspired by the bible-mt5 parallel viewer built by Dr. Zhan Chen (Associate Professor, Digital Social Science & Associate Distinguished Research Fellow at the Research Centre for History and Culture, United International College — BNU-HKBU UIC, Zhuhai). Dr. Chen's own scholarship focuses on Syriac biblical texts (dissertation: An Investigation into the Peshitta of Isaiah, Harvard NELC, 2020) and Chinese Bible translations.

Data sources

Greek NT (tagged)
STEP Bible TAGNT from Tyndale House Cambridge, CC BY 4.0, via github.com/STEPBible/STEPBible-Data
Peshitta NT
Via the Aramaic Root Atlas corpus (Jossi Fresco Benaim) — Syriac text, CSV-packaged.
Clementine Vulgate
Public domain, via seven1m/open-bibles (USFX)
English verse gloss
World English Bible (WEB), public domain
Peshitta roots, cognates, and sister-roots
Aramaic Root Atlas (Jossi Fresco Benaim) — triliteral-root extraction, sister-root detection, and Hebrew / Arabic cognate mapping. Used by this viewer's click-for-tooltip layer on Peshitta tokens.
Alignment validation benchmark
Berean Interlinear Bible — methodology check only (67.7% agreement on a 673-verse sample).
Alignment generation
Anthropic Claude Opus 4.8

API

The same artifacts that power the rendered viewer are exposed as a public, read-only JSON API under /api/v1/. CORS-open, edge-cached, OpenAPI-documented — any tool can fetch the alignment data, the search index, or the corpus manifest without scraping HTML or cloning the repository.

All endpoints are CORS-open and edge-cached. License: CC BY 4.0 for the derived alignment data; redistribute with attribution to upstream sources.

Related works

Peshitta Constellations
peshitta.onrender.com — companion project exploring the Peshitta corpus.
Aramaic Root Atlas
aramaic-root-atlas.onrender.com — cross-corpus Semitic triliteral root explorer with Hebrew and Arabic cognate mapping. Provides the Peshitta root data consumed by this viewer's tooltip layer.
BibCrit
bibcrit.app — biblical criticism workspace for textual analysis.

What this app is good for

The underlying engine is a multi-witness parallel-text viewer with AI-generated alignment and apparatus annotations. Anywhere a text exists in more than one version, this tool can show it side-by-side with per-word alignment and annotated divergences.

Biblical scholarship

Teaching

Translation work

Reader-facing discovery

Roadmap

Up next

Expand book coverage

New Testament (continuing the current scope):

Hebrew Bible / Tanakh (a separate flagship):

Deuterocanonical / Apocrypha:

Expand witness coverage (within current books)

Viewer features

Export

Sub-projects

Versification

Verse labels follow NA28 numbering. The Clementine Vulgate uses a different verse split in some chapters; where necessary (notably Mark 9 and Mark 4:40–41) we've remapped the Vulgate text to the NA28 boundaries. See the repository's known-issues.md for the complete list of edits.

License & reuse

The viewer code is intended to be released under an open-source license once the project is publicly funded. The derived alignment JSON is produced from sources with mixed licenses (CC BY 4.0, public domain); any future redistribution will credit the upstream sources.

Contact

Feedback welcome — jossi@somosunodigital.com.