A. Seza Dogruöz

Learn More
There is a growing interest in automatically predicting the gender and age of authors from texts. However, most research so far ignores that language use is related to the social identity of speakers , which may be different from their biological identity. In this paper, we combine insights from sociolinguistics with data collected through an online game,(More)
Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate(More)
Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of 'Computational Sociolinguistics' that reflects this increased interest. We aim to(More)
There are more multilingual speakers in the world than monolin-gual ones. Immigration is one of the key factors to bring speakers of different languages in contact with each other. In order to develop relevant policies and recommendations tailored according to the needs of immigrant communities, it is essential to understand the interactions between the(More)
In this paper, we present a series of experiments in which we analyze the usage of graffiti style features for signaling personal gang identification in a large, online street gangs forum, with an accuracy as high as 83% at the gang alliance level and 72% for the specific gang. We then build on that result in predicting how members of different gangs signal(More)
Languages spoken by immigrants change due to contact with the local languages. Capturing these changes is problematic for current language technologies, which are typically developed for speakers of the standard dialect only. Even when dialec-tal variants are available for such technologies , we still need to predict which dialect is being used. In this(More)
—The purpose of the research described in this paper is to examine the existence of correlation between low level audio, visual and textual features and movie content similarity. In order to focus on a well defined and controlled case, we have built a small dataset of movie scenes from three sequel movies. In addition, manual annotations have led to a(More)
  • 1