SPlit: An Optimal Method for Data Splitting

@article{Joseph2020SPlitAO,
  title={SPlit: An Optimal Method for Data Splitting},
  author={V. R. Joseph and Akhil Vakayil},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.10945}
}
In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal representative points of a continuous distribution. We adapt SP for subsampling from a dataset using a sequential nearest neighbor algorithm. We also extend SP to deal with categorical variables so that SPlit can be applied to both regression and classification problems… Expand

References

SHOWING 1-10 OF 52 REFERENCES
Data Splitting
  • 127
Data splitting for artificial neural networks using SOM-based stratified sampling
  • 142
Classification and Regression by randomForest
  • 11,042
  • PDF
Projected support points: a new method for high-dimensional data reduction.
  • 8
  • PDF
Stein Points
  • 39
  • PDF
Information-Based Optimal Subdata Selection for Big Data Linear Regression
  • 55
  • PDF
Regularization Paths for Generalized Linear Models via Coordinate Descent.
  • 9,112
  • PDF
Random Forests
  • L. Breiman
  • Mathematics, Computer Science
  • Machine Learning
  • 2004
  • 53,682
  • PDF
Regression Shrinkage and Selection via the Lasso
  • 31,684
  • Highly Influential
  • PDF
...
1
2
3
4
5
...