Twitter Topic Classification

@article{Antypas2022TwitterTC,
  title={Twitter Topic Classification},
  author={Dimosthenis Antypas and Asahi Ushio and Jos{\'e} Camacho-Collados and Leonardo Neves and V'itor Silva and Francesco Barbieri},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.09824}
}
Social media platforms host discussions about a wide variety of topics that arise everyday. Making sense of all the content and organising it into categories is an arduous task. A common way to deal with this issue is relying on topic modeling, but topics discovered using this technique are difficult to interpret and can differ from corpus to corpus. In this paper, we present a new task based on tweet topic classification and release two associated datasets. Given a wide range of topics… 

Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts

This paper focuses on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021, and provides a set of language model baselines and performs an analysis on the language model performance on the task.

TweetNLP: Cutting-Edge Natural Language Processing for Social Media

The main contributions of TweetNLP are an integrated Python library for a modern toolkit supporting social media analysis using various task-specific models adapted to the social domain, and an interactive online demo for codeless experimentation using the authors' models.

References

SHOWING 1-10 OF 65 REFERENCES

Comparing Twitter and Traditional Media Using Topic Models

This paper empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling, and finds interesting and useful findings for downstream IR or DM applications.

A survey of recent methods on deriving topics from Twitter: algorithm to evaluation

A review of recent methods proposed to derive topics from social media platform from algorithms to evaluations and the gaps in the research this far and the problems that remain to be addressed are highlighted.

A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts

This research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context and sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.

How Noisy Social Media Text, How Diffrnt Social Media Sources?

This work investigates just how linguistically noisy or otherwise text in social media text is over a range of social media sources, in the form of YouTube comments, Twitter posts, web user forum posts, blog posts and Wikipedia, which is compared to a reference corpus of edited English text.

TimeLMs: Diachronic Language Models from Twitter

It is shown that a continual learning strategy contributes to enhancing Twitter-based language models’ capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks.

Tweet2Vec: Character-Based Distributed Representations for Social Media

A character composition model, tweet2vec, is proposed, which finds vector-space representations of whole tweets by learning complex, non-local dependencies in character sequences.

TweetNLP: Cutting-Edge Natural Language Processing for Social Media

The main contributions of TweetNLP are an integrated Python library for a modern toolkit supporting social media analysis using various task-specific models adapted to the social domain, and an interactive online demo for codeless experimentation using the authors' models.

Information Extraction for Social Media

A framework for Information Extraction from unstructured user generated contents on social media is proposed, which proposes solutions to overcome the IE challenges in this domain such as the short context, the noisy sparse contents and the uncertain contents.

Twitter Topic Modeling by Tweet Aggregation

The results show that aggregating similar tweets into individual documents significantly increases topic coherence.
...