• Corpus ID: 17411706

Field-Aware Factorization Machines

  title={Field-Aware Factorization Machines},
  author={Chao Ma and Yuze Liao and Yuan Wang and Zhen Xiao},
Field-aware factorization machines (FFM) has experienced significant interest recently since its good performance on large-scale sparse data. Current systems for this model such as LibFFM, however, runs only on a single machine that would reach both the computation and storage limit when the data go very large. In this work, we introduce F2M, a distributed FFM implementation that can offer good performance and scalability. We will stand on a system designer’s perspective to demonstrate how to… 

Figures from this paper

AFFM: Auto feature engineering in field-aware factorization machines for predictive analytics

A novel efficient machine learning approach to deal with user identification and prediction using Field-aware Factorization Machine's approach using auto feature engineering techniques, which has the capacity to handle multiple features within the same field.

Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video Services

This paper presents a multi- graph structured multi-scenario recommendation solution, which encapsulates interaction data across scenarios with multi-graph and obtains representation via graph learning and outperforms regular method in increasing the number of outer-sc scenario videos.

Efficient implementation of incremental proximal-point methods

This work provides e-cient algorithms and corresponding implementations of proximal operators in order to make experimentation with incremental proximal optimization algorithms accessible to a larger audience of researchers and practitioners, and to promote additional theoretical research into these methods by closing the gap between their theoretical description in research papers and their use in practice.

Repeat Buyer Prediction for E-Commerce

This paper created profiles for users, merchants, brands, categories, items and their interactions via extensive feature engineering for repeat buyer prediction based on the sales data of the ``Double 11" shopping event in 2014 at Tmall.com.



Scaling Distributed Machine Learning with the Parameter Server

View on new challenges identified are shared, and some of the application scenarios such as micro-blog data analysis and data processing in building next generation search engines are covered.

Large Scale Distributed Deep Networks

This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.

MALT: distributed data-parallelism for existing ML applications

MALT is introduced, a machine learning library that integrates with existing machine learning software and provides data parallel machine learning, and can be used to provide data-parallelism to existing ML applications written in C++ and Lua and based on SVM, matrix factorization and neural networks.

Pairwise interaction tensor factorization for personalized tag recommendation

The factorization model PITF (Pairwise Interaction Tensor Factorization) is presented which is a special case of the TD model with linear runtime both for learning and prediction and shows that this model outperforms TD largely in runtime and even can achieve better prediction quality.

Spark: Cluster Computing with Working Sets

Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.

Brook : An Easy and Efficient Framework for Distributed Machine Learning

Brook is a new framework for distributed machine learning problems that adopts the parameter server paradigm that simplifies the task of distributed programming and provides a novel system component called parameter agent that masks the communication details between workers and servers by mapping remote servers to local in-memory file.

A reliable effective terascale linear learning system

We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features, billions of training examples and millions of

MLI: An API for Distributed Machine Learning

The initial results show that this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive performance and scalability.

Exploiting Bounded Staleness to Speed Up Big Data Analytics

Extensive experiments with ML algorithms for topic modeling, collaborative filtering, and PageRank show that both approaches significantly increase convergence speeds, behaving similarly when there are no stragglers, but SSP outperforms BSP in the presence ofstragglers.

Hadoop: The Definitive Guide

This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.