Corpus ID: 17411706

Field-Aware Factorization Machines

  title={Field-Aware Factorization Machines},
  author={Chao Ma and Yuze Liao and Y. Wang and Zhen Xiao},
Field-aware factorization machines (FFM) has experienced significant interest recently since its good performance on large-scale sparse data. Current systems for this model such as LibFFM, however, runs only on a single machine that would reach both the computation and storage limit when the data go very large. In this work, we introduce F2M, a distributed FFM implementation that can offer good performance and scalability. We will stand on a system designer’s perspective to demonstrate how to… Expand
2 Citations

Figures from this paper

AFFM: Auto feature engineering in field-aware factorization machines for predictive analytics
A novel efficient machine learning approach to deal with user identification and prediction using Field-aware Factorization Machine's approach using auto feature engineering techniques, which has the capacity to handle multiple features within the same field. Expand
Repeat Buyer Prediction for E-Commerce
This paper created profiles for users, merchants, brands, categories, items and their interactions via extensive feature engineering for repeat buyer prediction based on the sales data of the ``Double 11" shopping event in 2014 at Expand


Scaling Distributed Machine Learning with the Parameter Server
View on new challenges identified are shared, and some of the application scenarios such as micro-blog data analysis and data processing in building next generation search engines are covered. Expand
Large Scale Distributed Deep Networks
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training. Expand
MALT: distributed data-parallelism for existing ML applications
MALT is introduced, a machine learning library that integrates with existing machine learning software and provides data parallel machine learning, and can be used to provide data-parallelism to existing ML applications written in C++ and Lua and based on SVM, matrix factorization and neural networks. Expand
Pairwise interaction tensor factorization for personalized tag recommendation
The factorization model PITF (Pairwise Interaction Tensor Factorization) is presented which is a special case of the TD model with linear runtime both for learning and prediction and shows that this model outperforms TD largely in runtime and even can achieve better prediction quality. Expand
Spark: Cluster Computing with Working Sets
Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time. Expand
Brook : An Easy and Efficient Framework for Distributed Machine Learning
Brook is a new framework for distributed machine learning problems that adopts the parameter server paradigm that simplifies the task of distributed programming and provides a novel system component called parameter agent that masks the communication details between workers and servers by mapping remote servers to local in-memory file. Expand
Ad click prediction: a view from the trenches
The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system. Expand
A reliable effective terascale linear learning system
We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features, billions of training examples and millions ofExpand
MLI: An API for Distributed Machine Learning
The initial results show that this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive performance and scalability. Expand
Exploiting Bounded Staleness to Speed Up Big Data Analytics
Extensive experiments with ML algorithms for topic modeling, collaborative filtering, and PageRank show that both approaches significantly increase convergence speeds, behaving similarly when there are no stragglers, but SSP outperforms BSP in the presence ofstragglers. Expand