On the Difficulty of Designing Good Classifiers

@article{Grigni1996OnTD,
  title={On the Difficulty of Designing Good Classifiers},
  author={Michelangelo Grigni and Vincent Mirelli and Christos H. Papadimitriou},
  journal={SIAM J. Comput.},
  year={1996},
  volume={30},
  pages={318-323}
}
It is a very interesting and well-studied problem, given two point sets W, B\(\subseteq\)ℜn, to design a linear decision tree that classifies them —that is, no leaf subdivision contains points from both B and W — and is as simple as possible, either in terms of the total number of nodes, or in terms of its depth. We show that, unless ZPP=NP, the depth of a classifier cannot be approximated within a factor smaller than 6/5, and that the total number of nodes cannot be approximated within a… 

Figures from this paper

Separability of Point Sets by k-Level Linear Classification Trees

It is shown that a 2-level tree can be computed, if one exists, in time O(n2), and that a minimum-level (3 ≤ k ≤ log n) treeCan be computed in time nO(log n).

Learning Small Trees and Graphs that Generalize

A progressive sampling method based on Rademacher penalization that yields reasonable data dependent sample complexity estimates for learning two-level decision trees and a new scheme for deriving generalization error bounds for prunings of induced decision trees.

Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

This paper applies Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase, and generalizes the error-bounding approach from binary classification to multi-class situations.

Rademacher Penalization over Decision Tree Prunings

This paper applies Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase, and generalizes the error-bounding approach from binary classification to multi-class situations.

The Application of Rule-Based Methods to Class Prediction Problems in Genomics

This work proposes a method for constructing classifiers using logical combinations of elementary rules that is a form of rule-based classification, which has been widely discussed in the literature and is particularly useful for classification problems in genomics.

Transferencia de aprendizaje mediante bosques de decisión

Los bosques de decision son una herramienta que se han popularizado para resolver diferentes tareas de vision por computadora. Sus principales ventajas son su alta eficiencia computacional, los

References

SHOWING 1-10 OF 22 REFERENCES

Decision Tree Construction in Fixed Dimensions: Being Global is Hard but Local Greed is Good

It is shown that optimal decision tree construction is NP-complete, even for 3-dimensional point sets, and a number of interesting approximation bounds are proved on the use of random sampling for nding optimal splitting hyperplanes in greedy decision tree constructions.

Automatic Capacity Tuning of Very Large VC-Dimension Classifiers

It is shown that even high-order polynomial classifiers in high dimensional spaces can be trained with a small amount of training data and yet generalize better than classifiers with a smaller VC-dimension.

Improved Hardness Results for Approximating the Chromatic Number

First, a simplified geometric proof is presented for the result of C. Lund and M. Yannakakis saying that for some /spl epsiv/>0 it is NP-hard to approximate the chromatic number of graphs with N vertices by a factor of N/sup /spl Epsiv//, and polynomial lower bounds in terms of /spl chi/.

Induction of Oblique Decision Trees

A randomized technique for partitioning examples using oblique hyperplanes to create smaller but equally accurate decision trees than those created by other methods, and is found to produce surprisingly small trees without losing predic-tive accuracy.

Zero knowledge and the chromatic number

  • U. FeigeJ. Kilian
  • Mathematics, Computer Science
    Proceedings of Computational Complexity (Formerly Structure in Complexity Theory)
  • 1996
A new technique, inspired by zero-knowledge proof systems, is presented for proving lower bounds on approximating the chromatic number of a graph, and the result matches (up to low order terms) the known gap for approximation the size of the largest independent set.

A training algorithm for optimal margin classifiers

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions,

Free bits, PCPs and non-approximability-towards tight results

A proof system for NP is presented using logarithmic randomness and two amortized free bits, so that Max clique is hard within N/sup 1/3/ and chromatic number within N-Sup 1/5/, and a comprehensive study of PCP and FPCP parameters is initiated.

Proof verification and hardness of approximation problems

The authors improve on their result by showing that NP=PCP(logn, 1), which has the following consequences: (1) MAXSNP-hard problems do not have polynomial time approximation schemes unless P=NP; and (2) for some epsilon >0 the size of the maximal clique in a graph cannot be approximated within a factor of n/sup ePSilon / unless P =NP.

On the hardness of approximating minimization problems

It is proved that there is an c >0 such that Graph Coloring cannot be approximated with ratio n’ unless P=NP unless NP is contained in DTIME[nPOIY log ~ ].