Employing Structural and Textual Feature Extraction for Semistructured Document Classification

@article{Khabbaz2012EmployingSA,
  title={Employing Structural and Textual Feature Extraction for Semistructured Document Classification},
  author={M. Khabbaz and K. Kianmehr and R. Alhajj},
  journal={IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)},
  year={2012},
  volume={42},
  pages={1566-1578}
}
This paper addresses XML document classification by considering both structural and content-based features of the documents. This approach leads to better constructing a set of informative feature vectors that represents both structural and textual aspects of XML documents. For this purpose, we integrate soft clustering of words and feature reduction into the process. To extract structural information, we employ an existing frequent tree-mining algorithm combined with an information gain filter… Expand
21 Citations
XML classification using ensemble learning on extracted features
Document Clustering Approaches using Affinity Propagation
...
1
2
3
...

References

SHOWING 1-10 OF 49 REFERENCES
Transforming XML Trees for Efficient Classification and Clustering
Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data
XRules: an effective structural classifier for XML data
A weighted common structure based clustering technique for XML documents
Text categorization using the semi-supervised fuzzy c-means algorithm
  • M. Benkhalifa, A. Bensaid, A. Mouradi
  • Computer Science
  • 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397)
  • 1999
The hybrid representation model for web document classification
Survey of Text Mining: Clustering, Classification, and Retrieval
On feature distributional clustering for text categorization
Sequential Pattern Mining for Structure-Based XML Document Classification
...
1
2
3
4
5
...