Cleaning Tweets Tokenize and basic cleaning Twitter lingo cleaning Camel case Cleaning Spell Corrections Clean Tweets LIWC 2007 Categorizing LIWC 2007 Dictionary Based Features Other Tweet based Features

@inproceedings{Ludu2014CleaningTT,
  title={Cleaning Tweets Tokenize and basic cleaning Twitter lingo cleaning Camel case Cleaning Spell Corrections Clean Tweets LIWC 2007 Categorizing LIWC 2007 Dictionary Based Features Other Tweet based Features},
  author={Puneet Singh Ludu},
  year={2014}
}
In this paper we try to classify a user into three categories: “Gender”, “Age” and “Political Affiliation” with an application to Indian Twitter users. Our approach automatically predicts these attributes by leveraging observable information such as the tweet behavior, linguistic content of the user’s Twitter feed and the celebrities followed by the user. This paper would also use a novel feature that we would define in this paper as “class influencers”. Class influencers are the twitter users… CONTINUE READING

Figures and Tables from this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 18 REFERENCES