AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System

  title={AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System},
  author={Xiang Li and Xiaojiang Zhou and Yao Xiao and Peihao Huang and Dayao Chen and Sheng Chen and Yunsen Xian},
Industrial search and recommendation systems mostly follow the classic multi-stage information retrieval paradigm: matching, pre-ranking, ranking, and re-ranking stages. To account for system efficiency, simple vector-product based models are commonly de-ployed in the pre-ranking stage. Recent works consider distilling the high knowledge of large ranking models to small pre-ranking models for better effectiveness. However, two major challenges in pre-ranking system still exist: (i) without… 

Figures and Tables from this paper


Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach
A novel pre-ranking approach is proposed which supports complicated models with interaction-focused architecture and achieves a better tradeoff between effectiveness and efficiency by utilizing the proposed learnable Feature Selection method based on feature Complexity and variational Dropout.
Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System
A novel way to train ranking models, such as recommender systems, that are both effective and efficient is proposed, and a smaller student model is trained to learn to rank documents/items from both the training data and the supervision of a larger teacher model.
AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction
This work proposes a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS), which can automatically identify important feature interactions for factorization models with computational cost just equivalent to training the target model to convergence.
CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models
This paper presents CTR-BERT, a novel lightweight cache-friendly factorized model for CTR prediction that consists of twin-structured BERT-like encoders for text with a mechanism for late fusion for text and tabular features and significantly outperforms a traditional CTR baseline.
Learning Tree-based Deep Model for Recommender Systems
A novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks is proposed and can be jointly learnt towards better compatibility with users' interest distribution and hence facilitate both training and prediction.
DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems
This work proposes an improved framework DCN-V2, which is simple, can be easily adopted as building blocks, and has delivered significant offline accuracy and online business metrics gains across many web-scale learning to rank systems at Google.
Privileged Features Distillation at Taobao Recommendations
By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, this work achieves significant improvements over their strong baselines.
Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations
This paper showcases how to apply a two-tower neural network framework, which is also known as dual encoder in the natural language community, to improve a large-scale, production app recommendation system and offers a novel negative sampling approach called Mixed Negative Sampling (MNS).
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search is a fast and inexpensive approach for automatic model design that establishes a new state-of-the-art among all methods without post-training processing and delivers strong empirical performances using much fewer GPU-hours.
Deep Learning Recommendation Model for Personalization and Recommendation Systems
A state-of-the-art deep learning recommendation model (DLRM) is developed and its implementation in both PyTorch and Caffe2 frameworks is provided and a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers is designed.