Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries
@article{Wiedemann2014DocumentRF, title={Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries}, author={Gregor Wiedemann and Andreas Niekler}, journal={ArXiv}, year={2014}, volume={abs/1707.03217} }
This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often cannot describe their research objective with a small set of key terms, especially when dealing with theoretical or rather abstract research interests. Instead, it is much easier to define a set of paradigmatic documents which reflect topics of interest as…
6 Citations
Concepts Through Time: Tracing Concepts in Dutch Newspaper Discourse (1890-1990) using Word Embeddings
- Sociology
- 2015
In this paper, we use a new technique, called Concepts Through Time (CTT), to trace concepts in newspaper discourse. CTT makes use of sequential semantic spaces to follow semantic shifts of concepts…
Technical Writer in the Framework of Modern Natural Language Processing Tasks
- Computer ScienceJournal of Siberian Federal University. Humanities & Social Sciences
- 2019
This study focuses on technical writer competences and necessary specialized language resources, supporting any language worker in the framework of modern natural language processing domain…
Automated Fact Checking in the News Room
- Computer ScienceWWW
- 2019
An automated fact checking platform which given a claim, it retrieves relevant textual evidence from a document collection, predicts whether each piece of evidence supports or refutes the claim, and returns a final verdict.
The Impact of Data Challenges on Intent Detection and Slot Filling for the Home Assistant Scenario
- Computer Science2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP)
- 2019
This paper systematically generates datasets in the Romanian language that model these data complexities and investigates how well two of the most prominent tools – Wit.ai and Rasa NLU – solve the tasks of intent detection and slot filling, given the considered data complexities.
Text Mining für die Analyse qualitativer Daten
- Political Science
- 2016
Der Beitrag fasst die Ergebnisse der Fallstudien aus Teil II des Bandes zusammen. Dabei wird deutlich, dass der Einsatz von Text Mining in der qualitativen Sozialforschung die Chance bietet, die…
Methoden, Qualitätssicherung und Forschungsdesign
- Philosophy
- 2016
Dieser Beitrag stellt die integrierte Nutzung von Verfahren der Automatischen Sprachverarbeitung, welche als Text Mining bezeichnet werden, und inhaltsanalytischer Methoden der Sozial- und der…
20 References
Using Term Co-occurrence Data for Document Indexing and Retrieval
- Computer Science
- 2000
This article presents their work on an indexing and retrieval method that, base on the vector space model, incorporates term depe ndencies and thus obtains semantically richer representation s of documents.
The limitations of term co-occurrence data for query expansion in document retrieval systems
- Computer ScienceJ. Am. Soc. Inf. Sci.
- 1991
This article demonstrates that the similar terms identified by cooccurrence data in a query expansion system tend to occur very frequently in the database that is being searched.
Pivoted document length normalization
- Computer ScienceSIGIR '96
- 1996
Pivoted normalization is presented, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities, and two new normalization functions are presented–-pivoted unique normalization and piuotert byte size nornaahzation.
TREC: Experiment and Evaluation in Information Retrieval
- Computer Science
- 2006
ad hoc retrieval, filtering, question answering) that encapsulate different research agendas in the community. The end result of each track meeting is an overview report written by the track…
Detection of Domain Specific Terminology Using Corpora Comparison
- Computer ScienceLREC
- 2004
This paper evaluates the usefulness of a corpora comparison approach in order to find pinpoint corpus specific words in orderto identify uniterms in the field of telecommunications.
Automatic ranking of information retrieval systems using data fusion
- Computer ScienceInf. Process. Manag.
- 2006
Generalized vector spaces model in information retrieval
- Computer ScienceSIGIR '85
- 1985
This paper proposes a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme and demonstrates how such correlations can be included with minimal modification in the existing vector based information retrieval systems.
A theoretical basis for the use of co-occurence data in information retrieval
- Computer Science
- 1977
This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system by measuring the extent of the dependence between index terms and using it to construct a non‐linear weighting function.
A vector space model for automatic indexing
- Computer ScienceCACM
- 1975
An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Evaluating the performance of information retrieval systems using test collections
- Computer ScienceInf. Res.
- 2013
System-oriented evaluation that focuses on measuring system effectiveness: how well an information retrieval system can separate relevant from non-relevant documents for a given user query is discussed.