Fast Estimation of Multinomial Logit Models: R Package mnlogit

@article{Hasan2014FastEO,
  title={Fast Estimation of Multinomial Logit Models: R Package mnlogit},
  author={Asad Hasan and Wang Zhiyu and Alireza S. Mahani},
  journal={arXiv: Computation},
  year={2014}
}
We present R package mnlogit for training multinomial logistic regression models, particularly those involving a large number of classes and features. Compared to existing software, mnlogit offers speedups of 10x-50x for modestly sized problems and more than 100x for larger problems. Running mnlogit in parallel mode on a multicore machine gives an additional 2x-4x speedup on up to 8 processor cores. Computational efficiency is achieved by drastically speeding up calculation of the log… 

Figures and Tables from this paper

mixl: An open-source R package for estimating complex choice models on large datasets

The architecture and performance of mixl, a new R package for the estimation of advanced choice models, are presented, details its use, and some results using real world data and models are presented.

Efficient Bayesian Modeling of Binary and Categorical Data in R: The UPG Package

UPG provides several methods for fast production of output tables and summary plots that are easily accessible to a broad range of users and a convenient estimation framework for balanced and imbalanced data settings where sampling e-ciency is ensured through marginal data augmentation.

Estimation of Random Utility Models in R: The mlogit Package

mlogit is a package for R which enables the estimation of random utility models with choice situation and/or alternative specific variables. The main extensions of the basic multinomial model

Stochastic Newton Sampler: R Package sns

The R package sns implements Stochastic Newton Sampler (SNS), a Metropolis-Hastings Monte Carlo Markov Chain algorithm where the proposal density function is a multivariate Gaussian based on a local,

Model choice for regression models with a categorical response

  • J. Kalina
  • Economics
    Journal of Applied Mathematics, Statistics and Informatics
  • 2022
Abstract The multinomial logit model and the cumulative logit model represent two important tools for regression modeling with a categorical response with numerous applications in various fields.

Parameter estimation of multinomial logistic regression model using least absolute shrinkage and selection operator (LASSO)

The results showed that the LASSO estimates are similar to those from parametric estimation, which is a good sign for variable selection in the model.

Constrained Statistical Inference for Categorical Data

Using real-world data from the Canadian Community Health Survey, the methodology of using constraints showed significant improvement on methodology that does not, which substantiates the added value of the work presented here.

Stochastic gradient descent methods for estimation with large data sets

The sgd package in R offers the most extensive and robust implementation of stochastic gradient descent methods, which include the wide class of generalized linear models as well as M-estimation for robust regression.

IDCeMPy: Python Package for Inflated Discrete Choice Models

Inflated discrete choice models have been developed to address category inflation in ordered and unordered polytomous outcome variables as failing to do so leads to model misspecification and incorrect inferences.

Multiple multi-sample testing under arbitrary covariance dependency

A procedure to evaluate the strength of the associations between a nominal (categorical) response variable and multiple features simultaneously and a sensible trade-off between the expected numbers of true and false rejections is proposed.

References

SHOWING 1-10 OF 50 REFERENCES

Regularization Paths for Generalized Linear Models via Coordinate Descent.

In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.

Multinomial logistic regression algorithm

The lower bound principle (introduced in Böhning and Lindsay 1988, Ann. Inst. Statist. Math., 40, 641–663), Böhning (1989, Biometrika, 76, 375–383) consists of replacing the second derivative matrix

The VGAM Package for Categorical Data Analysis

Classical categorical regression models such as the multinomial logit and proportional odds models are shown to be readily handled by the vector generalized linear and additive model (VGLM/VGAM)

Extended Model Formulas in R : Multiple Parts and Multiple Responses

Model formulas are the standard approach for specifying the variables in statistical models in the S language. Although being eminently useful in an extremely wide class of applications, they have

maxent: An R Package for Low-memory Multinomial Logistic Regression with Support for Semi-automated Text Classification

The focus of this maximum entropy classifier is to minimize memory consumption on very large datasets, particularly sparse document-term matrices represented by the tm text mining package.

Discrete Choice Methods with Simulation

Discrete Choice Methods with Simulation by Kenneth Train has been available in the second edition since 2009 and contains two additional chapters, one on endogenous regressors and one on the expectation–maximization (EM) algorithm.

A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization

This paper examines the numerical performances of two methods for large-scale optimization: a limited memory quasi-Newton method (L-BFGS), and a discrete truncated-Newton method (TN). Various ways of

Making Logistic Regression A Core Data Mining Tool A Practical Investigation of Accuracy, Speed, and Simplicity.

This paper demonstrates that a very simple parameter-free implementation of logistic regression (LR) is accurately accurate and fast to compete with state-of-the-art binary classifiers on large real-world datasets and appears to outperform several common LRting procedures in the authors' experiments.

Diagnostic Checking in Regression Relationships

A rich variety of diagnostic tests for these situations have been developed in the econometrics community, a collection of which has been implemented in the packages lmtest and strucchange covering the problems mentioned above.

On the limited memory BFGS method for large scale optimization

The numerical tests indicate that the L-BFGS method is faster than the method of Buckley and LeNir, and is better able to use additional storage to accelerate convergence, and the convergence properties are studied to prove global convergence on uniformly convex problems.