Classifying Factored Genres with Part-of-Speech Histograms

@inproceedings{Feldman2009ClassifyingFG,
  title={Classifying Factored Genres with Part-of-Speech Histograms},
  author={Sergey Feldman and Marius A. Marin and Julie Medero and Mari Ostendorf},
  booktitle={HLT-NAACL},
  year={2009}
}
This work addresses the problem of genre classification of text and speech transcripts, with the goal of handling genres not seen in training. Two frameworks employing different statistics on word/POS histograms with a PCA transform are examined: a single model for each genre and a factored representation of genre. The impact of the two frameworks on the classification of training-matched and new genres is discussed. Results show that the factored models allow for a finer-grained representation… CONTINUE READING

Similar Papers

Figures, Results, and Topics from this paper.

Key Quantitative Results

  • Examining the 100 hand-labeled web documents, we find that adding the higher-order moments improves classifier accuracy from 23% to 55%.

Explore Further: Topics Discussed in This Paper