DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

@article{Wang2021DCNVI,
  title={DCN V2: Improved Deep \& Cross Network and Practical Lessons for Web-scale Learning to Rank Systems},
  author={Ruoxi Wang and Rakesh Shivanna and Derek Zhiyuan Cheng and Sagar Jain and Dong Lin and Lichan Hong and Ed H. Chi},
  journal={Proceedings of the Web Conference 2021},
  year={2021}
}
Learning effective feature crosses is the key behind building recommender systems. However, the sparse and large feature space requires exhaustive search to identify effective crosses. Deep & Cross Network (DCN) was proposed to automatically and efficiently learn bounded-degree predictive feature interactions. Unfortunately, in models that serve web-scale traffic with billions of training examples, DCN showed limited expressiveness in its cross network at learning more predictive feature… 

Figures and Tables from this paper

Revisiting Deep Learning Models for Tabular Data
TLDR
An overview of the main families of DL architectures for tabular data is performed and the bar of baselines in tabular DL is raised by identifying two simple and powerful deep architectures, including a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works.
User Response Prediction in Online Advertising
TLDR
A taxonomy is proposed to categorize state-of-the-art user response prediction methods, primarily focusing on the current progress of machine learning methods used in different online platforms, and applications of user response Prediction, benchmark datasets, and open source codes in the field are reviewed.
Information Retrieval using Machine learning for Ranking: A Review
TLDR
An attempt has been made in this paper to position some of the most widely used algorithms in the community and provides a survey of the methods used to rank the documents collected and their assessment strategies.
PHN: Parallel heterogeneous network with soft gating for CTR prediction
TLDR
A Parallel Heterogeneous Network (PHN) model is proposed, which constructs a network with parallel structure through three different interaction analysis methods, and uses Soft Selection Gating to feature heterogeneous data with different structure to mitigate the influence of weak gradient phenomenon.
Enhancing CTR Prediction with Context-Aware Feature Representation Learning
TLDR
This paper proposes a novel module named Feature Refinement Network (FRNet), which learns context-aware feature representations at bit-level for each feature in different contexts and can be applied in many existing methods to boost their performance.
Jury Learning: Integrating Dissenting Voices into Machine Learning Models
TLDR
A deep learning architecture that models every annotator in a dataset, samples from annotators’ models to populate the jury, then runs inference to classify enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent.
Detecting Arbitrary Order Beneficial Feature Interactions for Recommender Systems
TLDR
HIRS is the first work that directly generates beneficial feature interactions of arbitrary orders and makes recommendation predictions accordingly, and it outperforms state-of-the-art algorithms by up to 5% in terms of recommendation accuracy.
DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
TLDR
This work proposes DHEN a deep and hierarchical ensemble architecture that can leverage strengths of heterogeneous interaction modules and learn a hierarchy of the interactions under different orders, and proposes a novel co-designed training system that can further improve the training efficiency of DHEN.
MISS: Multi-Interest Self-Supervised Learning Framework for Click-Through Rate Prediction
  • Wei Guo, Can Zhang, Rui Zhang
  • Computer Science
    2022 IEEE 38th International Conference on Data Engineering (ICDE)
  • 2022
TLDR
A novel Multi-Interest Self-Supervised learning (MISS) framework which enhances the feature embeddings with interest-level self-supervision signals and can be used as an “plug-in” component with existing CTR prediction models and further boost their performances.
CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models
TLDR
This paper presents CTR-BERT, a novel lightweight cache-friendly factorized model for CTR prediction that consists of twin-structured BERT-like encoders for text with a mechanism for late fusion for text and tabular features and significantly outperforms a traditional CTR baseline.
...
...

References

SHOWING 1-10 OF 58 REFERENCES
Wide & Deep Learning for Recommender Systems
TLDR
Wide & Deep learning is presented---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems and is open-sourced in TensorFlow.
Deep & Cross Network for Ad Click Predictions
TLDR
This paper proposes the Deep & Cross Network (DCN), which keeps the benefits of a DNN model, and beyond that, it introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions.
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
TLDR
This paper shows that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions, and combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture.
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
TLDR
A novel Compressed Interaction Network (CIN), which aims to generate feature interactions in an explicit fashion and at the vector-wise level and is named eXtreme Deep Factorization Machine (xDeepFM), which is able to learn certain bounded-degree feature interactions explicitly and can learn arbitrary low- and high-order feature interactions implicitly.
Deep Learning Recommendation Model for Personalization and Recommendation Systems
TLDR
A state-of-the-art deep learning recommendation model (DLRM) is developed and its implementation in both PyTorch and Caffe2 frameworks is provided and a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers is designed.
Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions
TLDR
The Adaptive Factorization Network is proposed, a new model that learns arbitrary-order cross features adaptively from data and is a logarithmic transformation layer that converts the power of each feature in a feature combination into the coefficient to be learned.
SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-Task Learning
TLDR
SubNetwork Routing modularizes the shared low-level hidden layers into multiple layers of subnetworks, and controls the connection of sub-networks with learnable latent variables to achieve flexible parameter sharing to improve the accuracy of multi-task models while maintaining their computation efficiency.
Product-Based Neural Networks for User Response Prediction
  • Yanru Qu, Han Cai, Jun Wang
  • Computer Science
    2016 IEEE 16th International Conference on Data Mining (ICDM)
  • 2016
TLDR
A Product-based Neural Networks (PNN) with an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfield categories, and further fully connected layers to explore high-order feature interactions.
Neural Collaborative Filtering vs. Matrix Factorization Revisited
TLDR
It is shown that with a proper hyperparameter selection, a simple dot product substantially outperforms the proposed learned similarities and that MLPs should be used with care as embedding combiner and that dot products might be a better default choice.
Are we really making much progress? A worrying analysis of recent neural recommendation approaches
TLDR
A systematic analysis of algorithmic proposals for top-n recommendation tasks that were presented at top-level research conferences in the last years sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.
...
...