Challenges in measuring online advertising systems
With the rise of social networking, and other sites which collect vast amounts of user data, the issue of user privacy has never been more important. When creating user profiles care must be taken to avoid collecting sensitive information, while ensuring that these profiles are fit for purpose. In this paper we present a specific instance of the privacypreserving profiling problem in an expert-finding application. We present a dataset of profiles, as well as several datasets for contaminating these profiles, and provide experiments to test data quality and privacy-preserving performance. We present a simple solution based on training an LSA model on a clean profile corpus, which maintains performance and provides a moderate level of privacy.