Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising

@article{Ahmed2014ScalableHM,
  title={Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising},
  author={Amr Ahmed and Abhimanyu Das and Alex Smola},
  journal={Proceedings of the 7th ACM international conference on Web search and data mining},
  year={2014}
}
Many estimation tasks come in groups and hierarchies of related problems. [...] Key Method Implementation is achieved by a distributed subgradient oracle and the successive application of prox-operators pertaining to groups and subgroups of variables. We apply this algorithm to conversion optimization in display advertising. Experimental results on over 1TB data for up to 1 billion observations and 1 million attributes show that the algorithm provides significantly better prediction accuracy while…Expand
Predicting User Behavior in Display Advertising via Dynamic Collective Matrix Factorization
TLDR
This paper aims to predict the conversion response of users by jointly examining the past purchase behavior and the click response behavior, and model the temporal dynamics of post-click conversions into a unified framework. Expand
Distributed MultiTask Relationship Learning
Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to improve the generalization performance for each task. Traditionally, to perform multi-task learning, oneExpand
Predicting Different Types of Conversions with Multi-Task Learning in Online Advertising
TLDR
This paper forms conversion prediction as a multi-task learning problem, so that the prediction models for different types of conversions can be learned together, and proposes Multi-Task Field-weighted Factorization Machine (MT-FwFM) to solve these tasks jointly. Expand
Distributed variance regularized Multitask Learning
TLDR
This paper presents a method to scale up MTL methods which penalize the variance of the task weight vectors, and builds upon the alternating direction method of multipliers to decouple the variance regularizer. Expand
Robust Representations for Response Prediction
TLDR
This chapter proposes a novel matrix factorization approach named the dynamic collective matrix factorsization (DCMF), which considers temporal dynamics of post-click conversions and also takes advantages of the side information of users, advertisements, and items. Expand
Distributed Multi-Task Relationship Learning
TLDR
This paper proposes a distributed multi-task learning framework that simultaneously learns predictive models for each task as well as task relationships between tasks alternatingly in the parameter server paradigm and proposes a communication-efficient primal-dual distributed optimization algorithm to solve theDual problem by carefully designing local subproblems to make the dual problem decomposable. Expand
An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion Prediction
TLDR
An ablation approach is used to systematically study recent approaches that incorporate both multitask learning and “entire space modeling” which train the CVR on all logged examples rather than learning a conditional likelihood of conversion given clicked, and shows that several different approaches result in similar levels of positive transfer. Expand
A Survey on MultiTask Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performanceExpand
A Survey on Multi-Task Learning
TLDR
A survey for MTL is given, which classifies different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approaches, task relation learning approaches, and decomposition approach, and then discusses the characteristics of each approach. Expand
A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems
TLDR
This work proposes a content-based recommendation system to address both the recommendation quality and the system scalability, and proposes to use a rich feature set to represent users, according to their web browsing history and search queries, using a Deep Learning approach. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 24 REFERENCES
Web-scale multi-task feature selection for behavioral targeting
TLDR
This paper forms a multi-task (or group) feature-selection problem among a set of related tasks (sharing a common set of features), namely advertising campaigns, and applies a group-sparse penalty consisting of a combination of an l1 and l2 penalty and an associated fast optimization algorithm for distributed parameter estimation. Expand
An architecture for parallel topic models
TLDR
This paper describes a high performance sampling architecture for inference of latent topic models on a cluster of workstations and shows that this architecture is entirely general and that it can be extended easily to more sophisticated latent variable models such as n-grams and hierarchies. Expand
Scalable inference in latent variable models
TLDR
A scalable parallel framework for efficient inference in latent variable models over streaming web-scale data by introducing a novel delta-based aggregation system with a bandwidth-efficient communication protocol, schedule-aware out-of-core storage, and approximate forward sampling to rapidly incorporate new data. Expand
Discovering Structure in Multiple Learning Tasks: The TC Algorithm
TLDR
The task-clustering algorithm TC clusters learning tasks into classes of mutually related tasks, and outperforms its non-selective counterpart in situations where only a small number of tasks is relevant. Expand
Web-scale user modeling for targeting
TLDR
This paper presents mechanisms for building web-scale user profiles in a daily incremental fashion, and shows how to reduce the latency through in-memory processing of billions of user records. Expand
Optimization with Sparsity-Inducing Penalties
TLDR
This monograph covers proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view. Expand
Convex multi-task feature learning
TLDR
It is proved that the method for learning sparse representations shared across multiple tasks is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. Expand
Feature hashing for large scale multitask learning
TLDR
This paper provides exponential tail bounds for feature hashing and shows that the interaction between random subspaces is negligible with high probability, and demonstrates the feasibility of this approach with experimental results for a new use case --- multitask learning. Expand
Learning Gaussian processes from multiple tasks
We consider the problem of multi-task learning, that is, learning multiple related functions. Our approach is based on a hierarchical Bayesian framework, that exploits the equivalence betweenExpand
Multitask Learning
  • R. Caruana
  • Computer Science
  • Encyclopedia of Machine Learning and Data Mining
  • 1998
TLDR
Suggestions for how to get the most out of multitask learning in artificial neural nets are presented, an algorithm forMultitask learning with case-based methods like k-nearest neighbor and kernel regression is presented, and an algorithms for multitasklearning in decision trees are sketched. Expand
...
1
2
3
...