• Corpus ID: 7120895

A Brief Survey of Text Mining

@article{Hotho2005ABS,
  title={A Brief Survey of Text Mining},
  author={Andreas Hotho and A. N{\"u}rnberger and Gerhard Paass},
  journal={LDV Forum},
  year={2005},
  volume={20},
  pages={19-62}
}
The enormous amount of information stored in unstructured texts cannot simply be used for further processing by computers, which typically handle text as simple sequences of character strings. Therefore, specific (pre-)processing methods and algorithms are required in order to extract useful patterns. Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. In this article, we discuss text mining as a young and interdisciplinary… 

Figures from this paper

Framework for Knowledge Discovery from Journal Articles Using Text Mining Techniques
TLDR
This study discusses text mining as a young interdisciplinary field in the intersection of the related areas such as information access - otherwise known as information retrieval, computational linguistics, data mining, statistics and natural language processing.
A comparative study of various text mining techniques
TLDR
In depth analysis of various text mining techniques, their working, complexity, merits and demerits have been presented in simple yet effective manner.
Analyzing Different Approaches of Text Mining Techniques and Applications
TLDR
In this paper, the focus is to study the basic concept of various Text Mining techniques, its applications, benefits and limitations, and their applications and benefits has been presented.
A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques
TLDR
Several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering are described, which briefly explain text mining in biomedical and health care domains.
PREPRINT VERSION - IN PRESS 35 Framework for Knowledge Discovery from Journal Articles Using Text Mining Techniques
TLDR
This study discusses text mining as a young interdisciplinary field in the intersection of the related areas such as information access - otherwise known as information retrieval, computational linguistics, data mining, statistics and natural language processing.
Text Mining and Its Applications
TLDR
This paper describes text mining as a method for information retrieval, machine learning, statistical analysis and especially data mining, and gives different approaches for the main analysis tasks preprocessing, classification, clustering, information extraction and visualization.
A Systematic study of Text Mining Techniques
TLDR
Text mining involves the pre-processing of document collections such as information extraction, term extraction, text categorization, and storage of intermediate representations and the techniques that are used to analyse these intermediate representations such as clustering, distribution analysis, association rules and visualisation of the results.
Document analysis by means of data mining techniques
TLDR
A novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer) is presented, that is based on an item set-based model, i.e., a framework comprise of frequent itemsets, taken out from the document collection, that significantly outperforms the considered competitors.
Topics Discovery in Text Mining
TLDR
This paper overviews some general techniques for text data mining, based on text retrieval models, that can be applicable to any text in natural language.
A Review of Machine Learning Algorithms for Text-Documents Classification
TLDR
This paper provides a review of the theory and methods of document classification and text mining, focusing on the existing techniques and methodologies, focused mainly on text representation and machine learning techniques.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 124 REFERENCES
Text mining with information extraction
TLDR
Experimental results demonstrate that discovered patterns in extracted text can be used to effectively improve the underlying IE method, and an approach to using rules mined from extracted data to improve the accuracy of information extraction is presented.
Untangling Text Data Mining
TLDR
Data mining, information access, and corpus-based computational linguistics are defined and the relationship of these to text data mining is discussed, and the intent behind these contrasts is to draw attention to exciting new kinds of problems for computational linguists.
Text mining: finding nuggets in mountains of textual data
TLDR
This work defines the notion of “text mining”, focuses on the differences between text and data mining, and describes in some more detail the unique technologies that are key to successful text mining.
Knowledge Discovery in Textual Databases (KDT)
TLDR
This research combines the KDD and text categorization paradigms and suggests advances to the state of the art in both areas.
Inductive learning algorithms and representations for text categorization
TLDR
A comparison of the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy is compared.
Automatic structuring and retrieval of large text files
TLDR
An alternative approach is introduced which uses the document collections themselves as a basis for the text analysis, together with sophisticated text matching operations carried out at several levels of detail.
Machine learning in automated text categorization
TLDR
This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Information extraction
TLDR
A relatively new development—information extraction (IE)—is the subject of this article and can transform the raw material, refining and reducing it to a germ of the original text.
Evaluating the Performance of Text Mining Systems on Real-world Press Archives
TLDR
It turns out that with respect to some features human annotators exhibit a lower performance than the text mining systems, establishing a convincing argument to use textmining systems to support indexing of large document collections.
Ontologies improve text document clustering
TLDR
This work integrates core ontologies as background knowledge into the process of clustering text documents and compares clustering techniques based on pre-categorizations of texts from Reuters newsfeeds and on a smaller domain of an eLearning course about Java.
...
1
2
3
4
5
...