Enabling complex analysis of large-scale digital collections: humanities research, high-performance computing, and transforming access to British Library digital collections

@inproceedings{Terras2018EnablingCA,
  title={Enabling complex analysis of large-scale digital collections: humanities research, high-performance computing, and transforming access to British Library digital collections},
  author={Melissa Mhairi Terras and James Baker and James Hetherington and David Beavan and Martin Zaltz Austwick and Anne Welsh and Helen O'Neill and Will Finley and Oliver Duke-Williams and Adam Farquhar},
  booktitle={Digit. Scholarsh. Humanit.},
  year={2018}
}
Although there has been a drive in the cultural heritage sector to provide large-scale, open data sets for researchers, we have not seen a commensurate rise in humanities researchers undertaking complex analysis of these data sets for their own research purposes. This article reports on a pilot project at University College London, working in collaboration with the British Library, to scope out how best high-performance computing facilities can be used to facilitate the needs of researchers in… 

Figures from this paper

Confluence between library and information science and digital humanities in Spain. Methodologies, standards and collections
TLDR
There is an urgent need to strengthen the “scientific relationships” between heritage institutions, as well as enhancing links between the academic field of DH and LIS in order to improve teaching and research strategies in conjunction.
Digital preservation at Big Data scales: proposing a step-change in preservation system architectures
TLDR
Preservation systems are at a step-change as they move to Big Data scale architectures and respond to more technical research processes, and this paper is a timely illustration of the state of play at this pivotal moment.
History Playground: A Tool for Discovering Temporal Trends in Massive Textual Corpora
TLDR
The tool makes use of scalable algorithms to first extract trends from textual corpora, before making them available for real-time search and discovery, presenting users with an interface to explore the data.
Struggling with digitized historical newspapers: Contextual barriers to information interaction in history research activities
On account of the complexities related to the use of digitized newspapers, researchers may encounter barriers when interacting with the collections' content. Overcoming barriers that could influence
Big translation history
This article proposes the term Big Translation History (BTH) to describe a translation history that can be analysed computationally and that we define as involving: (1) large-scale research
The Rare Books Catalog and the Scholarly Database
TLDR
A researcher's eye view of the value of the library catalog not only as a database to be searched for surrogates of objects of study, but as a corpus of text that can be analyzed in its own right, or incorporated within the researcher's own research database.
Library Carpentry: Software Skills Training for Library Professionals
TLDR
This paper describes Library Carpentry, an introductory software skills training programme with a focus on the needs and requirements of library and information professionals, and argues that adding software skills to university librarians' armoury is an effective and important use of professional development resource.
Language Resources and Evaluation Conference 11 – 16 May 2020 8 th Workshop on Challenges in the Management of Large Corpora ( CMLC-8 )
TLDR
This paper addresses long-term archival for large corpora with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).
Electronic Library of the RSL: Development Stages and Features of Formation of Digital Collections
TLDR
The article highlights the role of the “Electronic Library of Dissertations”, which became the basis for enriching the EL of the RSL with relevant scientific knowledge, for working out new technical and technological solutions, and for providing unique access opportunities within the current legislation.
defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data
TLDR
Defoe, a new scalable and portable digital eScience toolbox that enables historical research, allows for running text mining queries across large datasets, such as historical newspapers and books in parallel via Apache Spark.
...
...

References

SHOWING 1-10 OF 74 REFERENCES
Opening Access to collections: the making and using of open digitised cultural content
TLDR
It is demonstrated that increasingly open licensing of digital cultural heritage content is creating opportunities for researchers in the arts and humanities for both access to and analysis of cultural heritage materials.
The Role of CLARIN in Digital Transformations in the Humanities
TLDR
Case studies of early CLARIN demonstrators give a flavour of the possibilities of digital transformations in a number of humanities disciplines, and there is huge potential for important future new directions in literary and linguistic computing.
Hermeneutica: Computer-Assisted Interpretation in the Humanities
TLDR
Hermeneutica introduces text analysis using computer-assisted interpretive practices, offers theoretical chapters about text analysis, presents a set of analytical tools that instantiate the theory, and provides example essays that illustrate the use of these tools.
The Forecast for Special Libraries
TLDR
This column takes a close look at Outsell's Information Industry Outlook report for 2016 and its implications for special libraries.
The Getty End-User Online Searching Project in the Humanities: Report No. 6: Overview and Conclusions
TLDR
An overview of the Getty Information Institute's major study of end-user online searching by humanities scholars and its results is presented, with particular emphasis on matters of interest to academic librarians.
Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers
The Viral Texts Project (http://viraltexts.org) is an interdisciplinary and collaborative effort among the authors listed here, with contributions from project alumni Elizabeth Maddock Dillon, Kevin
Comedy, caricature and the social order, 1820–50
It is 50 years since Edward Thompson introduced historians to the phrase, the idea, the reality of 'the condescension of posterity'.(1) And while Thompson restricted his lens to the poor and
Transforming Roles: Canadian Academic Librarians Embedded in Faculty Research Projects
Academic librarians have always played an important role in providing research services and research-skills development to faculty in higher education. But that role is evolving to include the
User attitudes toward end-user literature searching.
TLDR
Survey results support a perceived need for end-user searching and confirmed recommendations of the Association of American Medical Colleges on medical information science skills.
Twenty-five years of end-user searching, Part 1: Research findings
  • K. Markey
  • Computer Science
    J. Assoc. Inf. Sci. Technol.
  • 2007
TLDR
This is the first part of a two-part article that reviews 25 years of published research findings on end-user searching in online information retrieval (IR) systems and poses a host of new research questions that will further the understanding about end- user searching of online IR systems.
...
...