Machine Translation for Accessible Multi-Language Text Analysis

@article{Chew2023MachineTF,
  title={Machine Translation for Accessible Multi-Language Text Analysis},
  author={Edward W. Chew and William D. Weisman and Jingying Huang and Seth Frey},
  journal={ArXiv},
  year={2023},
  volume={abs/2301.08416}
}
English is the international standard of social research, but scholars are increasingly conscious of their responsibility to meet the need for scholarly insight into communication processes globally. This tension is as true in computational methods as any other area, with revolutionary advances in the tools for English language texts leaving most other languages far behind. In this paper, we aim to leverage those very advances to demonstrate that multi-language analysis is currently accessible… 

Figures from this paper

Building Sentiment Lexicons for All Major Languages

This paper addresses the lexicon gap in a multilingual world by building high-quality sentiment lexicons for 136 major languages and integrates a variety of linguistic resources to produce an immense knowledge graph.

POLYGLOT-NER: Massive Multilingual Named Entity Recognition

This paper describes a system that builds Named Entity Recognition (NER) annotators for 40 major languages using Wikipedia and Freebase, and proposes a new method, distant evaluation, based on statistical machine translation.

Large-scale evidence of dependency length minimization in 37 languages

Using parsed corpora of 37 diverse languages, it is shown that overall dependency lengths for all languages are shorter than conservative random baselines, suggesting that dependency length minimization is a universal quantitative property of human languages.

The Twitter of Babel: Mapping World Languages through Microblogging Platforms

It is shown that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods, and highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.

The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020

It is found that for the most common languages on Twitter there is a growing tendency, though not universal, to retweet rather than share new content, and it is shown that over time, the contagion ratios for most common language are growing more strongly than those of rare languages.

TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels

A large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages is offered, believing this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.

Multilingual Twitter Sentiment Classification: The Role of Human Annotators

It is shown that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large and there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.

Sentiment analysis to predict election results using Python

This paper determines the polarity and subjectivity measures for the collected tweets that help in understanding the user opinion for a particular candidate and compares the candidates over the type of sentiment.

Sentiment analysis of twitter data using machine learning approaches and semantic analysis

  • G. GautamDivakar Yadav
  • Computer Science
    2014 Seventh International Conference on Contemporary Computing (IC3)
  • 2014
This paper contributes to the sentiment analysis for customers' review classification which is helpful to analyze the information in the form of the number of tweets where opinions are highly unstructured and are either positive or negative, or somewhere in between of these two.

Effective Sentimental Analysis and Opinion Mining of Web Reviews Using Rule Based Classifiers

The proposed approach is experimented on online books and political reviews and demonstrates the efficacy through Kappa measures, which has a higher accuracy of 97.4 % and lower error rate.