The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods

@article{Nelson2018TheFO,
  title={The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods},
  author={Laura K. Nelson and Derek Burk and Marcel L Knudsen and Leslie McCall},
  journal={Sociological Methods \& Research},
  year={2018},
  volume={50},
  pages={202 - 237}
}
Advances in computer science and computational linguistics have yielded new, and faster, computational approaches to structuring and analyzing textual data. These approaches perform well on tasks like information extraction, but their ability to identify complex, socially constructed, and unsettled theoretical concepts—a central goal of sociological content analysis—has not been tested. To fill this gap, we compare the results produced by three common computer-assisted approaches—dictionary… 

Figures and Tables from this paper

Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches

A comparison of dictionary and scaling methods used in predicting the sentiment of German literature reviews to the “gold standard” of human-coded sentiments provides a practical guide for researchers to select an appropriate method and degree of pre-processing when working with complex texts.

All work and no play: A text analysis

Some of the key contemporary themes in text analytics and the likely future role of this method within market research and insight are discussed, including Q’s text analysis component and Google Cloud Natural Language.

Qualitative Coding in the Computational Era: A Hybrid Approach to Improve Reliability and Reduce Effort for Coding Ethnographic Interviews

Sociologists have argued that there is value in incorporating computational tools into qualitative research, including using machine learning to code qualitative data. Yet standard computational

Measuring and Visualizing Coders’ Reliability: New Approaches and Guidelines From Experimental Data

This study investigates inter- and intracoder reliability, proposing a new approach based on social network analysis (SNA) and exponential random graph models (ERGM) that is compatible with current ERGM models.

Epistemological Considerations of Text Mining: Implications for Systematic Literature Review

This article proposes to rethink the epistemological principles of text mining, by returning to the qualitative analysis of its meaning and structure, and presents alternatives, applicable to the process of constructing lexical matrices for the analysis of a complex textual corpus.

Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts

Generalized word shift graphs are introduced, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average.

The Augmented Social Scientist: Using Sequential Transfer Learning to Annotate Millions of Texts with Human-Level Accuracy

It is shown via an experiment that an expert can train a precise, efficient automatic classifier in a very limited amount of time and that, under certain conditions, expert-trained models produce better annotations than humans themselves.

Linguistic, cultural, and narrative capital: computational and human readings of transfer admissions essays

Variation in college application materials related to social stratification is a contentious topic in social science and national discourse in the United States. This line of research has also
...

References

SHOWING 1-10 OF 68 REFERENCES

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have

Computational Grounded Theory: A Methodological Framework

This article proposes a three-step methodological framework called computational grounded theory, which combines expert human knowledge and hermeneutic skills with the processing power and pattern

A Method of Automated Nonparametric Content Analysis for Social Science

This work develops a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly, and illustrates with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency.

Computer-Aided Content Analysis of Digitally Enabled Movements

*With the emergence of the Arab Spring and the Occupy movements, interest in the study of movements that use the Internet and social networking sites has grown exponentially. However, our inability

Automatic Extraction of Facts from Press Releases to Generate News Stories

JASPER is a fact extraction system recently developed and deployed by Carnegie Group for Reuters Ltd, which uses a template-driven approach, partial understanding techniques, and heuristic procedures to extract certain key pieces of information from a limited range of text.

Coder Reliability and Misclassification in the Human Coding of Party Manifestos

The findings indicate that misclassification is a serious and systemic problem with the current CMP data set and coding process, suggesting the CMP scheme should be significantly simplified to address reliability issues.

Information Extraction

  • M. Pazienza
  • Computer Science
    Lecture Notes in Computer Science
  • 2002
This paper discusses attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from Corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora.

Treating Words as Data with Error: Uncertainty in Text Statements of Policy Positions

This work characterizes processes by which CMP data are generated, and shows how to correct biased inferences, in recent prominently published work, derived from statistical analyses of error-contaminated C MP data.
...