• Corpus ID: 18287481

Mining Infrastructure in R

  title={Mining Infrastructure in R},
  author={Ingo Feinerer and Kurt Hornik and Dominique Meyer},
During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels. 

Figures and Tables from this paper

The study on keywords frequency composite function of public opinion toward Macau's gambling industry: Using the Fruit Fly Optimization Algorithm

  • Shianghau WuYong-dong Shi
  • Economics, Education
    2013 International Conference on Engineering, Management Science and Innovation (ICEMSI)
  • 2013
This study at first used the text mining method to analyze the keywords of the Chinese news reports related to Macan's gambling industry from June to September 2012. The study got 19 major keywords

Increasing Accuracy of Classifying Useful Reviews by Removing Neutral Terms

Customer product reviews have become one of the important factors for purchase decision makings. Customers believe that reviews written by others who have already had an experience with the product

Lost in Translation ? Predicting Party Group Affiliation from European Parliament Debates ∗

Ten countries joined the European Union in 2004. This offered a rare opportunity for the existing party groups to substantively increase their share of the seats in the European Parliament by

IoT Protocol Based Data Monitoring System

A large amount of data can be monitor and controlled with the use of Wi-Fi, internet of things (IoT), cloud computing (CC) and cyber-physical system (CPS).



Text Clustering with String Kernels in R

A package which provides a general framework for text mining in R using the S4 class system and the kernlab R package is presented, which explores the use of kernel methods for clustering on a set of text documents, using string kernels.

Survey of Text Mining: Clustering, Classification, and Retrieval

Survey of Text Mining II offers a broad selection in state-of-the art algorithms and software for text mining from both academic and industrial perspectives, to generate interest and insight into the state of the field.

Text Mining: Predictive Methods for Analyzing Unstructured Information

This book introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search, as well as new research areas that rely on evolving text- mining techniques.

An e-mail analysis method based on text mining techniques

Kernel Methods for Pattern Analysis

This book provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.

Machine learning in automated text categorization

This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

A Review of Two Text-Mining Packages

In extracting themes from unstructured data, both text mining packages were only marginally helpful, implying that a text mining approach, which is based on analysis units other than terms, may be more powerful in extracting themes, an idea touched upon in the conclusion section.

Untangling Text Data Mining

Data mining, information access, and corpus-based computational linguistics are defined and the relationship of these to text data mining is discussed, and the intent behind these contrasts is to draw attention to exciting new kinds of problems for computational linguists.

N-gram-based text categorization

An N-gram-based approach to text categorization that is tolerant of textual errors is described, which worked very well for language classification and worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject.

Text Classification using String Kernels

A novel kernel is introduced for comparing two text documents consisting of an inner product in the feature space consisting of all subsequences of length k, which can be efficiently evaluated by a dynamic programming technique.