The textcat Package for n-Gram Based Text Categorization in R

@inproceedings{Hornik2013TheTP,
  title={The textcat Package for n-Gram Based Text Categorization in R},
  author={Kurt Hornik},
  year={2013}
}
Identifying the language used will typically be the first step in most natural language processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Cavnar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. This paper presents the R extension package textcat for n-gram based text categorization which implements both the Cavnar and Trenkle approach as well… CONTINUE READING
Highly Cited
This paper has 21 citations. REVIEW CITATIONS
14 Citations
25 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-10 of 14 extracted citations

References

Publications referenced by this paper.
Showing 1-10 of 25 references

A Learning Experience: Training an Artificial Neural Network

  • 1992
Highly Influential
4 Excerpts

textcat: n-Gram Based Text Categorization

  • K Hornik, J Rauch, C Buchta, I Feinerer
  • R package version 1.0-0, URL http://CRAN.R…
  • 2013

tau: Text Analysis Utilities

  • K Hornik, I Feinerer, D Meyer
  • 2012
3 Excerpts

Similar Papers

Loading similar papers…