Comparing Frequency- and Style-Based Features for Twitter Author Identification

@inproceedings{Green2013ComparingFA,
  title={Comparing Frequency- and Style-Based Features for Twitter Author Identification},
  author={Rachel M. Green and John W. Sheppard},
  booktitle={FLAIRS Conference},
  year={2013}
}
Author identification is a subfield of Natural Language Processing (NLP) that uses machine learning techniques to identify the author of a text. Most previous research focused on long texts with the assumption that a minimum text length threshold exists under which author identification would no longer be effective. This paper examines author identification in short texts far below this threshold, focusing on messages retrieved from Twitter (maximum length: 140 characters) to determine the most… CONTINUE READING