Our project focuses on developing scalable tools to support the creation of scholarly editions, beginning with David Hume’s The History of England and expanding to other works later.
We are building a full pipeline for historical document digitization and text recognition. The process includes:
Special attention is given to layout understanding (e.g., distinguishing footnotes and headers), which significantly improves OCR accuracy, especially for early printed materials.
Our tools allow scholars to analyze textual changes across multiple editions of a work. By adapting the BLAST algorithm (originally used in bioinformatics), we can detect overlapping text across thousands of pages. These overlaps are then visualized through an interface that supports both distant and close reading, enabling the study of textual reuse, reception, and editorial changes over time.
The project launched in 2023 and as of now (June 2025) we have a working proof of concept for an end-to-end text recognition and comparison pipeline. By 2026, the tool will be refined for broader public use, with particular focus on usability for non-technical scholars. The tools developed will support future critical editions and be made available for reuse through open research infrastructures.