• Corpus ID: 31163826

A Machine Learning Approach to Quantitative Prosopography

@article{Gupta2018AML,
  title={A Machine Learning Approach to Quantitative Prosopography},
  author={Aayushee Gupta and Haimonti Dutta and Srikanta J. Bedathur and Lipika Dey},
  journal={ArXiv},
  year={2018},
  volume={abs/1801.10080}
}
Prosopography is an investigation of the common characteristics of a group of people in history, by a collective study of their lives. It involves a study of biographies to solve historical problems. If such biographies are unavailable, surviving documents and secondary biographical data are used. Quantitative prosopography involves analysis of information from a wide variety of sources about "ordinary people". In this paper, we present a machine learning framework for automatically designing a… 
1 Citations
Toward the optimized crowdsourcing strategy for OCR post-correction
TLDR
This is the first attempt to systematically investigate the influence of various factors on crowdsourcing-based OCR post-correction and propose an optimal strategy for this process.

References

SHOWING 1-10 OF 47 REFERENCES
Finding influential people from a historical news repository
TLDR
This thesis designs a People Gazetteer from the noisy OCR text of historical newspapers and identifies “influential” people from it and defines the notion of an Influential Person Index (IPI) and rank based on it.
Learning Parameters of the K-Means Algorithm From Subjective Human Annotation
TLDR
A pilot study to observe whether humans are adept at finding sub-categorization in an online searchable database of historically significant newspaper articles and whether seeds provided by annotators are carefully incorporated into a semi-supervised K-Means algorithm (Seeded K- means).
PastPlace – the global gazetteer from the people who brought you 'A Vision of Britain through Time'
This poster differs very substantially from the similarly titled poster prepared for the same meeting in 2013, as that concerned a new "gazetteer service" or API (Applications Programming Interface)
Learning a Named Entity Tagger from Gazetteers with the Partial Perceptron
TLDR
An algorithm is presented, called the Partial Perceptron, for discriminatively learning the parameters of a sequence model from such partially labeled data which yields a substantial relative improvement in recall and some loss in precision when compared to the gazetteer-driven method.
Lydia: A System for Large-Scale News Analysis
TLDR
The Lydia project seeks to build a relational model of people, places, and things through natural language processing of news sources and the statistical analysis of entity frequencies and co-locations.
Studying how the past is remembered: towards computational history through large scale text mining
TLDR
This work demonstrates how various computational tools can assist in studying history by revealing interesting topics and hidden correlations, and attempts to study how the past is remembered through large scale text mining.
How historians use historical newspapers
TLDR
This research focuses on historians' needs for searching collections of newspapers and managing the information they find, and discusses the implications for the design of interfaces and services that would serve as a historians' workbench.
Rethinking Sentiment Analysis in the News: from Theory to Practice and back
TLDR
The main tasks for news opinion mining are definition of the target; analysis of clearly marked opinion that is expressed explicitly, not needing interpretation or the use of world knowledge.
A Novel Approach to Automatic Gazetteer Generation using Wikipedia
TLDR
This work introduces a novel method to automatically generate gazetteers from seed lists using an external knowledge resource, the Wikipedia, that exploits the rich content and various structural elements of Wikipedia, and does not rely on language- or domain-specific knowledge.
Classifying news stories using memory based reasoning
TLDR
A method for classifying news stories using Memory Based Reasoning (MBR) a k-nearest neighbor method, that does not require manual topic definitions, that is effective in reducing the development time to implement classification systems involving large number of topics for the purpose of classification, message routing etc.
...
...