The Secret Lives of Names?: Name Embeddings from Social Media

  title={The Secret Lives of Names?: Name Embeddings from Social Media},
  author={Junting Ye and S. Skiena},
  journal={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  • Junting Ye, S. Skiena
  • Published 2019
  • Computer Science
  • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Your name tells a lot about you: your gender, ethnicity and so on. It has been shown that name embeddings are more effective in representing names than traditional substring features. However, our previous name embedding model is trained on private email data and are not publicly accessible. In this paper, we explore learning name embeddings from public Twitter data. We argue that Twitter embeddings have two key advantages: (i) they can and will be publicly released to support research… Expand
From Symbols to Embeddings: A Tale of Two Representations in Computational Social Science
A thorough review of data representations in CSS for both text and network is given and the tendency that embeddingbased representations are emerging and obtaining increasing attention over the last decade is discovered. Expand
Building Location Embeddings from Physical Trajectories and Textual Representations
This paper uses a new dataset consisting of the location trajectories of 729 students over a seven month period and text data related to those locations to create location embeddings, which are then employed in more complex downstream tasks ranging from predicting a student’s area of study to a student's depression level. Expand
How does that name sound? Name representation learning using accent-specific speech generation
SpokenName2Vec is proposed, a novel and generic algorithm which addresses the synonym suggestion problem by utilizing automated speech generation, and deep learning to produce novel spoken name embeddings that capture the way people pronounce names in a particular language and accent. Expand
Profiling US Restaurants from Billions of Payment Card Transactions
  • Himel Dev, H. Hamooni
  • Computer Science
  • 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)
  • 2020
This work presents a framework, believed to be the first framework to infer the cuisine types of restaurants by analyzing transaction data as the only source, and achieves a 76.2% accuracy in classifying the US restaurants. Expand
Applications of Machine Learning in Document Digitisation
An overview of the potential for applying machine digitisation for data collection through two illustrative applications and how attention-based neural networks for handwritten text recognition can be used to construct a treatment indicator is given. Expand
Give Me Your Tired, Your Poor, Your High-Skilled Labor: H-1B Lottery Outcomes and Entrepreneurial Success
We study how access to high-skill labor affects the outcomes of start-up firms. We obtain exogenous variation in firms’ ability to access skilled labor by using win rates in H 1B visa lotteries.Expand
Discrimination Against Foreigners in the U.S. Patent System
Inventions of foreign origin are about ten percentage points less likely to be granted a U.S. patent than domestic inventions. An empirical analysis of 1.5 million U.S. patent applications identifiesExpand
Discrimination against foreigners in the U.S. patent system
Inventions of foreign origin are about ten percentage points less likely to be granted a U.S. patent than domestic inventions. An empirical analysis of 1.5 million U.S. patent applications identifiesExpand
It's All in the Name: A Character Based Approach To Infer Religion
It is shown how character patterns learned by the classifier are rooted in the linguistic origins of names, which can explain the predictions of complex non-linear classifiers and circumvent their purported black box nature. Expand


Nationality Classification Using Name Embeddings
This work designs a fine-grained nationality classifier covering 39 groups representing over 90% of the world population and exploits the phenomena of homophily in communication patterns to learn name embeddings, a new representation that encodes gender, ethnicity, and nationality which is readily applicable to building classifiers and other systems. Expand
Generating Look-alike Names For Security Challenges
This work introduces the technique of distributed name embeddings, representing names in a high-dimensional space such that distance between name components reflects the degree of cultural similarity between these strings, and demonstrates that name embedDings strongly encode gender and ethnicity, as well as name popularity. Expand
Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors
This paper evaluates the inference accuracy gained by augmenting the user features with features derived from the Twitter profiles and postings of her friends, and considers three attributes which have varying degrees of assortativity: gender, age, and political affiliation. Expand
Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching
A novel alignment-based name matching algorithm, based on Smith-Waterman algorithm and logistic regression, is proposed, which can effectively identify nameethnicity from personal names in Wikipedia, which is used to define name-ethnicity to within 85% accuracy. Expand
Social Spammer Detection in Microblogging
An optimization formulation is presented that models the social network and content information in a unified framework that can effectively utilize both kinds of information for social spammer detection in microblogging. Expand
Planetary-scale views on a large instant-messaging network
It is found that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender. Expand
ePluribus: Ethnicity on Social Networks
An approach to determine the ethnic breakdown of a population based solely on people's names and data provided by the U.S. Census Bureau is demonstrated to be able to predict the ethnicities of individuals as well as the ethnicity of an entire population better than natural alternatives. Expand
User-Level Race and Ethnicity Predictors from Twitter Text
A data set of users who self-report their race/ethnicity through a survey is introduced, in contrast to previous approaches that use distantly supervised data or perceived labels, to develop predictive models from text which accurately predict the membership of a user to the four largest racial and ethnic groups. Expand
Homophily and Nationality Assortativity Among the Most Cited Researchers' Social Network
  • Michal Vaanunu, C. Avin
  • Computer Science
  • 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
  • 2018
This work defines type assortativity which measures the homophily level of each type and enable the comparison between types of different size within the network, and evaluates the definitions on a weighted, research collaboration, social network between the most cited authors in the ACM digital library. Expand
Deep Learning
Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data. Expand