• Corpus ID: 1986719

Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries

  title={Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries},
  author={Gregor Wiedemann and Andreas Niekler},
This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often cannot describe their research objective with a small set of key terms, especially when dealing with theoretical or rather abstract research interests. Instead, it is much easier to define a set of paradigmatic documents which reflect topics of interest as… 

Figures and Tables from this paper

Concepts Through Time: Tracing Concepts in Dutch Newspaper Discourse (1890-1990) using Word Embeddings

In this paper, we use a new technique, called Concepts Through Time (CTT), to trace concepts in newspaper discourse. CTT makes use of sequential semantic spaces to follow semantic shifts of concepts

Technical Writer in the Framework of Modern Natural Language Processing Tasks

This study focuses on technical writer competences and necessary specialized language resources, supporting any language worker in the framework of modern natural language processing domain

Automated Fact Checking in the News Room

An automated fact checking platform which given a claim, it retrieves relevant textual evidence from a document collection, predicts whether each piece of evidence supports or refutes the claim, and returns a final verdict.

The Impact of Data Challenges on Intent Detection and Slot Filling for the Home Assistant Scenario

This paper systematically generates datasets in the Romanian language that model these data complexities and investigates how well two of the most prominent tools – Wit.ai and Rasa NLU – solve the tasks of intent detection and slot filling, given the considered data complexities.

Text Mining für die Analyse qualitativer Daten

Der Beitrag fasst die Ergebnisse der Fallstudien aus Teil II des Bandes zusammen. Dabei wird deutlich, dass der Einsatz von Text Mining in der qualitativen Sozialforschung die Chance bietet, die

Methoden, Qualitätssicherung und Forschungsdesign

Dieser Beitrag stellt die integrierte Nutzung von Verfahren der Automatischen Sprachverarbeitung, welche als Text Mining bezeichnet werden, und inhaltsanalytischer Methoden der Sozial- und der

Using Term Co-occurrence Data for Document Indexing and Retrieval

This article presents their work on an indexing and retrieval method that, base on the vector space model, incorporates term depe ndencies and thus obtains semantically richer representation s of documents.

The limitations of term co-occurrence data for query expansion in document retrieval systems

This article demonstrates that the similar terms identified by cooccurrence data in a query expansion system tend to occur very frequently in the database that is being searched.

Pivoted document length normalization

Pivoted normalization is presented, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities, and two new normalization functions are presented–-pivoted unique normalization and piuotert byte size nornaahzation.

TREC: Experiment and Evaluation in Information Retrieval

ad hoc retrieval, filtering, question answering) that encapsulate different research agendas in the community. The end result of each track meeting is an overview report written by the track

Detection of Domain Specific Terminology Using Corpora Comparison

This paper evaluates the usefulness of a corpora comparison approach in order to find pinpoint corpus specific words in orderto identify uniterms in the field of telecommunications.

Automatic ranking of information retrieval systems using data fusion

Generalized vector spaces model in information retrieval

This paper proposes a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme and demonstrates how such correlations can be included with minimal modification in the existing vector based information retrieval systems.

A theoretical basis for the use of co-occurence data in information retrieval

This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system by measuring the extent of the dependence between index terms and using it to construct a non‐linear weighting function.

A vector space model for automatic indexing

An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.

Evaluating the performance of information retrieval systems using test collections

System-oriented evaluation that focuses on measuring system effectiveness: how well an information retrieval system can separate relevant from non-relevant documents for a given user query is discussed.