Probabilistic Visitor Stitching on Cross-Device Web Logs

  title={Probabilistic Visitor Stitching on Cross-Device Web Logs},
  author={Sungchul Kim and Nikhil Kini and Jay Pujara and Eunyee Koh and Lise Getoor},
  journal={Proceedings of the 26th International Conference on World Wide Web},
Personalization -- the customization of experiences, interfaces, and content to individual users -- has catalyzed user growth and engagement for many web services. A critical prerequisite to personalization is establishing user identity. However the variety of devices, including mobile phones, appliances, and smart watches, from which users access web services from both anonymous and logged-in sessions poses a significant obstacle to user identification. The resulting entity resolution task of… 

Figures and Tables from this paper

node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching

Identity stitching, the task of identifying and matching various online references (e.g., sessions over different devices and timespans) to the same user in real-world web services, is crucial for

Using Information in Access Logs for Large Scale Identity Linkage

This paper proposes an approach to use all the features in the logs using both online data traffic and offline data logs and shows that using information in access logs can be effective in linking identities and achieving a practical and scalable solution.

Learning and Multi-Objective Optimization for Automatic Identity Linkage

This paper focuses on linking identities using structured learning and multi objective optimization for Identity Graph and shows that the proposed approach can be very effective in linking identities and achieving a practical solution on a real data set.

Detecting Users from Website Sessions: A Simulation Study

A click simulation model is proposed, capable of simulating user censoring due to cookie churn or the usage of multiple devices, but for which the uncensored ground truth is kept, to recover unique users from session data.

Siamese Neural Networks for User Identity Linkage Through Web Browsing

A Siamese neural network (NN) architecture-based UIL (SAUIL) model that learns and compares the highest-level feature representation of input web-browsing behaviors with deep NNs is proposed, which addresses the imbalanced learning issue.

Robust Factorization Machines for User Response Prediction

This work characterize the data uncertainty using Robust Optimization (RO) paradigm to design approaches that are immune against perturbations and proposes two novel algorithms: robust factorization machine (RFM) and its field-aware variant (RFFM), under interval uncertainty.

Linking User Online Behavior across Domains with Internet Traffic

This work focuses on the area of cross-domain recommendation, advertising, and criminal tracking in online and offline world, since it is a very challenging task to link user online behaviors belonging to the same natural person.

Adobe Identity Graph

Adobe’s Identity Graph enables enterprises to stitch together all known and anonymous identities of a user between logical and physical devices, which allows companies to perform marketing and analytics in the context of people rather than signals coming from different devices.

Modeling and Analyzing Information Preparation Behaviors in Cross-Device Search

This paper presented a study on the search behaviors of a pre-switch device if there is repeated search occurring in the cross-device search, and trained a model of information preparation behavior by three supervised classification methods: Binary Logistic Regression, C5.0 Decision Tree and Support Vector Machine.



Overcoming browser cookie churn with clustering

A novel method to cluster browser cookies into groups that are likely to belong to the same browser based on a statistical model of browser visitation patterns, and proposes a greedy heuristic algorithm for solving it.

Personalizing Search on Shared Devices

An oracle study is presented (with perfect knowledge of which searchers perform each action on each machine) to under-stand the effectiveness of ABP in predicting searchers' future interests, and a classifier is developed to determine when to apply it that yields sizable gains in personalization performance.

Effective personalization based on association rule discovery from web usage data

This paper proposes effective and scalable techniques for Web personalization based on association rule discovery from usage data that can achieve better recommendation effectiveness, while maintaining a computational advantage over direct approaches to collaborative filtering such as the k-nearest-neighbor strategy.

Peering Through the Shroud: The Effect of Edge Opacity on IP-Based Client Identification

A methodology is developed and implemented by which a server can make a more informed decision on whether to rely on IP addresses for client identification or to use more heavyweight forms of client authentication.

Probabilistic Deduplication of Anonymous Web Traffic

This paper solves the problem of identifying whether two cookies map to the same visitor by converting categorical variables like IP addresses, product search keywords and URLs with very high cardinalities to continuous numeric variables using the Jaccard coefficient for each attribute.

Cross-Device Search

This paper characterize multi-device search across four device types, including aspects of search behavior on each device (e.g., topics of interest) and characteristics of device transitions, and proposes models to predict aspects of cross- device search transitions.

Studying User Footprints in Different Online Social Networks

This paper presents the analysis and results from applying automated classifiers for disambiguating profiles belonging to the same user from different social networks, and finds User ID and Name were found to be the most discriminative features for dis Ambiguating user profiles.

How Unique Is Your Web Browser?

  • P. Eckersley
  • Computer Science
    Privacy Enhancing Technologies
  • 2010
The degree to which modern web browsers are subject to "device fingerprinting" via the version and configuration information that they will transmit to websites upon request is investigated, and what countermeasures may be appropriate to prevent it is discussed.

Linking Users Across Domains with Location Data: Theory and Validation

This paper addresses the reconciliation problem for location-based datasets and introduces a robust method for this general setting, which outperforms naive rules and prior heuristics and can be shown to be robust even when data gets sparse.

People and Cookies: Imperfect Treatment Assignment in Online Experiments

It is shown that the estimated treatment effect in a cookie-level experiment converges to a weighted average of the marginal effects of treating more of a user's cookies, which underestimates the true person-level effect by a factor equal to the number of cookies per person.