Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples <i>n</i> as well as a large number of features <i>N</i>, while each example has only <i>s</i> << <i>N</i> non-zero features. This paper presents a Cutting Plane Algorithm for training linear SVMs that provably has training time <i>0(s,n)</i> for classification problems and <i>o</i>(<i>sn</i> log (<i>n</i>))for ordinal regression problems. The algorithm is based on an alternative, but equivalent formulation of the SVM optimization problem. Empirically, the Cutting-Plane Algorithm is several orders of magnitude faster than decomposition methods like svm light for large datasets.