Share This Author
Assessing the Impact of OCR Quality on Downstream NLP Tasks
- Daniel Alexander van Strien, K. Beelen, Mariona Coll Ardanuy, Kasra Hosseini, Barbara McGillivray, Giovanni Colavizza
- Computer ScienceICAART
A series of extrinsic assessment tasks are performed using popular, out-of-the-box tools in order to quantify the impact of OCR quality on these tasks, finding a consistent impact resulting from OCR errors on downstream tasks with some tasks more irredeemably harmed by O CR errors.
The citation advantage of linking publications to research data
- Giovanni Colavizza, I. Hrynaszkiewicz, Isla Staden, K. Whitaker, Barbara McGillivray
- Computer SciencePloS one
- 4 July 2019
It is found that, following mandated publisher policies, data availability statements become very common and there is an association between articles that include statements that link to data in a repository and up to 25.36% (± 1.07%) higher citation impact on average, using a citation prediction model.
A scientometric overview of CORD-19
- Giovanni Colavizza, R. Costas, V. Traag, N. J. van Eck, T. V. van Leeuwen, L. Waltman
- 20 April 2020
Based on a comparison to the Web of Science database, it is found that CORD-19 provides an almost complete coverage of research on COVID-19 and coronaviruses.
Deep Reference Mining From Scholarly Literature in the Arts and Humanities
- Danny Rodrigues Alves, Giovanni Colavizza, F. Kaplan
- Computer ScienceFront. Res. Metr. Anal.
- 13 July 2018
A deep learning architecture for reference mining from the full text of scholarly publications is applied and it is confirmed that there are important gains to be had by adopting deep learning for the task of reference mining.
Characterizing in-text citations in scientific articles: A large-scale analysis
Diachronic Evaluation of NER Systems on Old Newspapers
This paper investigates the performances of different NE recognition tools applied on old newspapers by conducting a diachronic evaluation over 7 time-series taken from the archives of Swiss newspaper Le Temps.
A principled methodology for comparing relatedness measures for clustering publications
- L. Waltman, K. Boyack, Giovanni Colavizza, Nees Jan van Eck
- Computer ScienceQuantitative Science Studies
- 21 January 2019
Using the BM25 text-based relatedness measure as the evaluation criterion, it is found that bibliographic coupling relations yield more accurate clustering solutions than direct citation relations and cocitation relations.
Crypto Art: A Decentralized View
- Massimo Franceschet, Giovanni Colavizza, Sebastian Hernandez
- Art, Computer ScienceLeonardo
- 9 June 2019
The authors propose a collection of viewpoints on crypto art from different actors of the system: artists, collectors, galleries, art historians and data scientists.
Clustering citation histories in the Physical Review
Experts and authorities receive disproportionate attention on Twitter during the COVID-19 crisis
While the threat of an "infodemic" remains, the results show that social media also provide a platform for experts and public authorities to be widely heard during a global crisis.