Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data

@article{Vateekul2009FastIO,
  title={Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data},
  author={Peerapon Vateekul and Miroslav Kubat},
  journal={2009 IEEE International Conference on Data Mining Workshops},
  year={2009},
  pages={320-325}
}
The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 13 CITATIONS

References

Publications referenced by this paper.
SHOWING 1-10 OF 16 REFERENCES

An evaluation of statistical approaches to text categorization

  • Y. Yang
  • Information Retrieval, vol. 1, no. 1/2, pp. 69–90…
  • 1999
Highly Influential
3 Excerpts

http://europa.eu.int/celex/eurovoc

  • European Communities
  • .
  • 2005
2 Excerpts

http://www.jrc.cec.eu.int/langtech/eurovoc.html

  • Institutional Exploratory Research Project JRC − IPSC
  • .
  • 2005
2 Excerpts

Similar Papers

Loading similar papers…