GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks

  title={GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks},
  author={Ping Li and Weijie Zhao},
  journal={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
  • Ping LiWeijie Zhao
  • Published 7 January 2022
  • Computer Science
  • Proceedings of the 31st ACM International Conference on Information & Knowledge Management
We propose using "powered generalized min-max'' (pGMM) hashed (linearized) via the "generalized consistent weighted sampling'' (GCWS) for training (deep) neural networks (hence the name "GCWSNet''). The pGMM and several related kernels were proposed in 2017. We demonstrate that pGMM hashed by GCWS provide a numerically stable scheme for applying power transformation on the original data, regardless of the magnitude of p and the data. Our experiments show that GCWSNet often improves the accuracy… 

pGMM Kernel Regression and Comparisons with Boosted Trees

The implementation of L p boost provides practitioners the additional option of tuning boosting algorithms for potentially achieving better accuracy in regression applications and is included in the package of “Fast ABC-Boost”.

CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing

This work proposes a novel framework, the generative cooperative hashing network (CoopHash), which is based on the energy-based cooperative learning, and shows that the proposed method outperforms the competing hashing supervised methods, achieving up to 10% relative improvement over the current state-of-the-art supervised hashing methods, and exhibits a significantly better performance in out- of-distribution retrieval.

C-MinHash: Improving Minwise Hashing with Circulant Permutation

This paper proposes Circulant MinHash (C-MinHash) and provides the surprising theoretical results that using only two independent random permutations in a circulant manner leads to uniformly smaller Jaccard estimation variance than that of the classical MinHash with K independent permutations.

Package for Fast ABC-Boost

This report presents the open-source package https://github.com/pltrees/abcboost which implements the series of boosting works over the past many years and includes mainly three lines of techniques which are already the standard implementations in popular boosted tree platforms.

℘-MinHash Algorithm for Continuous Probability Measures: Theory and Application to Machine Learning

A general ℘-MinHash sampling algorithm which generates samples following any target distribution, and preserves ℐ℘ between two distributions by the hash collision, and a refined early stopping rule is proposed under a practical boundedness assumption.

Noisy 𝓁0-Sparse Subspace Clustering on Dimensionality Reduced Data

It is shown that an optimal solution to the optimization problem of noisy (cid:96) 0 -SSC achieves subspace detection property (SDP), a key element with which data from different subspaces are separated, under deterministic and semi-random model.



GemNN: Gating-enhanced Multi-task Neural Networks with Feature Interaction Learning for CTR Prediction

This paper develops a neural network based multi-task learning model to predict CTR in a coarse-to-fine manner, which gradually reduces ad candidates and allows parameter sharing from upstream tasks to downstream tasks to improve the training efficiency.

AIBox: CTR Prediction Model Training on a Single Node

AIBox is presented, a centralized system to train CTR models with tens-of-terabytes-scale parameters by employing solid-state drives (SSDs) and GPUs, and a bi-level cache management system over SSDs to store the 10TB parameters while providing low-latency accesses.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

A distributed GPU hierarchical parameter server for massive scale deep learning ads systems that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage and the price-performance ratio of the proposed system is 4-9 times better than an MPI-cluster solution.

Agile and Accurate CTR Prediction Model Training for Massive-Scale Online Advertising Systems

This work presents Baidu's industrial-scale practices on how to apply the system and machine learning techniques to address issues and increase the revenue and focuses on the strategy for developing GPU-based CTR models combined with quantization techniques to build a compact and agile system which noticeably improves the revenue.

Simple and Efficient Weighted Minwise Hashing

This work proposes a simple rejection type sampling scheme based on a carefully designed red-green map, where the number of rejected sample has exactly the same distribution as weighted minwise sampling, and hopes that it will replace existing implementations in practice.

Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling

This paper proposes a strategy named “re-randomization” in the process of densification that could achieve the smallest variance among all densification schemes.

One sketch for all: Theory and Application of Conditional Random Sampling

This study modifies the original CRS and extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data.

Large-scale kernel machines

This volume offers researchers and engineers practical solutions for learning from large scale datasets, with detailed descriptions of algorithms and experiments carried out on realistically large datasets, and offers information that can address the relative lack of theoretical grounding for many useful algorithms.

Consistent Sampling Through Extremal Process

Interestingly, compared with CWS, the resultant algorithm only involves counting and does not need sophisticated mathematical operations (as required by CWS), so it is therefore not surprising that the proposed ES scheme is actually noticeably faster than CWS.