Comparing "parallel passages" in digital archives
@article{Harris2020ComparingP, title={Comparing "parallel passages" in digital archives}, author={Martyn Harris and M. Levene and Dell Zhang and D. Levene}, journal={J. Documentation}, year={2020}, volume={76}, pages={271-289} }
The purpose of this paper is to present a language-agnostic approach to facilitate the discovery of “parallel passages” stored in historic and cultural heritage digital archives.,The authors explore a novel, and relatively simple approach, using a character-based statistical language model combined with a tailored version of the Basic Local Alignment Tool to extract exact and approximate string patterns shared between groups of documents.,The approach is applicable to a wide range of languages… Expand
Topics from this paper
One Citation
The General Higher-Order Neural Network Model and Its Application to the Archive Retrieval in Modern Guangdong Customs Archives
- Computer Science
- IEEE Access
- 2020
- PDF
References
SHOWING 1-10 OF 20 REFERENCES
Character N-Gram Tokenization for European Language Text Retrieval
- Computer Science
- Information Retrieval
- 2004
- 348
- PDF
Aramaic Dialect Problems. II
- Philosophy
- The American Journal of Semitic Languages and Literatures
- 1936
- 2
A study of smoothing methods for language models applied to information retrieval
- Computer Science
- TOIS
- 2004
- 1,291
- PDF
Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods
- Computer Science
- Computational Linguistics
- 2010
- 253
- PDF
The generalised k-Truncated Suffix Tree for time-and space-efficient searches in multiple DNA or protein sequences
- Mathematics, Medicine
- Int. J. Bioinform. Res. Appl.
- 2008
- 28