Probabilistic Visitor Stitching on Cross-Device Web Logs


Personalization – the customization of experiences, interfaces, and content to individual users – has catalyzed user growth and engagement for many web services. A critical prerequisite to personalization is establishing user identity. However the variety of devices, including mobile phones, appliances, and smart watches, from which users access web services from both anonymous and logged-in sessions poses a significant obstacle to user identification. The resulting entity resolution task of establishing user identity across devices and sessions is commonly referred to as “visitor stitching.” We introduce a general, probabilistic approach to visitor stitching using features and attributes commonly contained in web logs. Using web logs from two real-world corporate websites, we motivate the need for probabilistic models by quantifying the difficulties posed by noise, ambiguity, and missing information in deployment. Next, we introduce our approach using probabilistic soft logic (PSL), a statistical relational learning framework capable of capturing similarities across many sessions and enforcing transitivity. We present a detailed description of model features and design choices relevant to the visitor stitching problem. Finally, we evaluate our PSL model on binary classification performance for two real-world visitor stitching datasets. Our model demonstrates significantly better performance than several state-of-the-art classifiers, and we show how this advantage results from collective reasoning across sessions.

DOI: 10.1145/3038912.3052711

7 Figures and Tables

Cite this paper

@inproceedings{Kim2017ProbabilisticVS, title={Probabilistic Visitor Stitching on Cross-Device Web Logs}, author={Sungchul Kim and Nikhil Kini and Jay Pujara and Eunyee Koh and Lise Getoor}, booktitle={WWW}, year={2017} }