Author Profiling with Doc2vec Neural Network-Based Document Embeddings

@inproceedings{Markov2016AuthorPW,
  title={Author Profiling with Doc2vec Neural Network-Based Document Embeddings},
  author={Ilia Markov and Helena G{\'o}mez-Adorno and Juan Pablo Francisco Posadas-Dur{\'a}n and Grigori Sidorov and Alexander Gelbukh},
  booktitle={MICAI},
  year={2016}
}
To determine author demographics of texts in social media such as Twitter, blogs, and reviews, we use doc2vec document embeddings to train a logistic regression classifier. [] Key Result Our method outperforms existing state of the art under some settings, though the current state-of-the-art results on those tasks have been quite weak.

Know your Neighbors: Efficient Author Profiling via Follower Tweets

TLDR
This work presents an approach, capable of extracting various feature types and, via sparse matrix factorization, learn a dense, low-dimensional representations of individual persons solely from their followers’ tweet streams, and is computationally non-demanding.

Profiling : Bot and Gender Prediction using a Multi-Aspect Ensemble Approach Notebook for PAN at CLEF 2019

TLDR
A two phase approach that exploits the TF-IDF features of the documents to train a model that learns to detect documents generated by bots and empirically shows the effectiveness of the proposed approach on the PAN 2019 development dataset for author profiling.

Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

TLDR
This work uses homophily cues to retrofit text-based author representations with non-linguistic information, and introduces a trade-off parameter that increases in-class similarity between authors, and improves classification performance by making classes more linearly separable.

Word Distance Approach for Celebrity Profiling

TLDR
This paper uses word distance features as input to different classifiers for different aspects of celebrity to create models and showed that word distance-based features outperformed the PAN baseline results.

CIC-GIL Approach to Author Profiling in Spanish Tweets: Location and Occupation

TLDR
The CIC-GIL approach to the author profiling (AP) task is presented at MEX-A3T 2018 and the results are competitive with other participating teams; in particular, the best run was ranked fourth in the shared task.

Language- and Subtask-Dependent Feature Selection and Classifier Parameter Tuning for Author Profiling

TLDR
The CIC’s approach to the Author Profiling (AP) task at PAN 2017 is presented, which consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic.

Big data analytics for critical information classification in online social networks using classifier chains

TLDR
A novel dataset is built, which contains the writing characteristics of 160,000 users of the Twitter OSN, and a proposal based on a multidimensional learning technique using CC transformation overcomes other similar proposals.

Predicting Learners' Demographics Characteristics: Deep Learning Ensemble Architecture for Learners' Characteristics Prediction in MOOCs

  • Tahani AljohaniA. Cristea
  • Computer Science
    Proceedings of the 2019 4th International Conference on Information and Education Innovations - ICIEI 2019
  • 2019
TLDR
A Deep Learning Architecture to predict the demographics characteristics of the learners in MOOCs, incorporating multi-feature representations and ensemble learning methods, and reports on initial tests of the model and architecture on a large dataset from the FutureLearn platform.

Multi-lingual Author Profiling using Stylistic Features

TLDR
This work submitted their system to the FIRE'18-MAPonSMS (Multi-lingual Author Profiling on SMS), a shared task to classify the attributes of an author like gender and age group from multilingual text specifically English +Roman Urdu.

A comparative study of author gender identification

  • Tugba Yildiz
  • Computer Science
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
  • 2019
TLDR
Different learning approaches based on machine learning (ML) and neural network language models are employed to address the problem of author gender identification to identify author gender by applying word embeddings and deep learning architectures to the Turkish language.

References

SHOWING 1-10 OF 56 REFERENCES

Improving Feature Representation Based on a Neural Network for Author Profiling in Social Media Texts

TLDR
It is shown that a neural network-based feature representation is enhanced by using this lexical resource, which includes dictionaries of slang words, contractions, abbreviations, and emoticons commonly used in social media.

Multilingual author profiling using word embedding averages and SVMs

  • R. BayotTeresa Gonçalves
  • Computer Science
    2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA)
  • 2016
TLDR
An experiment done to investigate author profiling of tweets in English and Spanish, particularly for cross genre evaluation shows that using average of word vectors outperforms tfidf in most cross genre problems for age and gender.

Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016

TLDR
This work describes the methodology proposed for the task of cross-genre author profiling at PAN 2016 and achieves the first place for gender detection in English and tied for second place in terms of joint accuracy.

Author Profiling: Predicting Age and Gender from Blogs Notebook for PAN at CLEF 2013

TLDR
This paper proposes a Machine Learning approach to determine unknown author's age and gender using three types of features: content based, style based and topic based.

Author Profiling using SVMs and Word Embedding Averages

TLDR
This approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation, but the main difference is that features used come from averages of word embeddings, specifically word2vec vectors.

Automatic Profiling of Twitter Users Based on Their Tweets: Notebook for PAN at CLEF 2015

TLDR
A novel way of computing the type/token ratio of an author is introduced and it is shown that, although strong correlations have been observed between high extroversion and low type/ token ratios in the past, this ratio is not necessarily a strong indicator of extrovertedness.

Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations

TLDR
The framework and the results of the Author Profiling task at PAN 2016, to predict age and gender from a cross-genre perspective, are presented.

Adapting Cross-Genre Author Profiling to Language and Corpus

TLDR
The approach to the Author Profiling (AP) task is presented, which aims at identifying the author’s age and gender under crossgenre AP conditions in three languages: English, Spanish, and Dutch.

Document Embedding with Paragraph Vectors

TLDR
This work observes that the Paragraph Vector method performs significantly better than other methods, and proposes a simple improvement to enhance embedding quality, and shows that much like word embeddings, vector operations on Paragraph Vectors can perform useful semantic results.

XRCE Personal Language Analytics Engine for Multilingual Author Profiling: Notebook for PAN at CLEF 2015

TLDR
This technical notebook describes the methodology used - and results achieved - for the PAN 2015 Author Profiling Challenge by the team from XRCE, and describes a largely language agnostic methodology for classification which uses language specific linguistic processing to generate categories.
...