• Corpus ID: 8729632

1 On Knowledgeable Unsupervised Text Mining

@inproceedings{Hotho20021OK,
  title={1 On Knowledgeable Unsupervised Text Mining},
  author={Andreas Hotho and Alexander Maedche and Steffen Staab and Valentin Zacharias},
  year={2002}
}
Text Mining is about discovering novel, interesting and useful patterns from textual data. In this paper we discuss several means that introduce background knowledge into unsupervised text mining in order to improve the novelty, the interestingness or the usefulness of the detected patterns. Germane to the different proposals is that they strive for higher abstractions that carry more explanatory power and more possibilities for exploring the input texts than is achievable by unknowledgeable… 

Figures from this paper

THE PECULIARITIES OF THE TEXT DOCUMENT REPRESENTATION, USING ONTOLOGY AND TAGGING-BASED CLUSTERING TECHNIQUE
TLDR
The proposed method solves locally applied language incompact usage in the process of document clus-tering by document repre-sentation based on tagging, and to improve clustering results by using knowledge technology – ontology.

References

SHOWING 1-10 OF 15 REFERENCES
Text Mining via Information Extraction
TLDR
This paper presents an intermediate approach, one that is called text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document.
An Information Extraction Core System for Real World German Text Processing
TLDR
SMES, an information extraction core system for real world German text processing is described, providing a set of basic powerful, robust, and efficient natural language components and generic linguistic knowledge sources which can easily be customized for processing different tasks in a flexible manner.
GETESS - Searching the Web Exploiting German Texts
TLDR
An intelligent information agent is designed such that as background knowledge and linguistic coverage increase, its benefits improve, while it guarantees state-of-the-art information and database retrieval capabilities as its bottom line.
Ontology-based text clustering
TLDR
A new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results is proposed.
Foundations of statistical natural language processing
TLDR
This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Architectural elements of language engineering robustness
TLDR
An architectural system that contributes to engineering robustness and low-overhead systems development (GATE, a General Architecture for Text Engineering) is presented and results from the development of a multi-purpose cross-genre Named Entity recognition system are presented.
When Is ''Nearest Neighbor'' Meaningful?
TLDR
The effect of dimensionality on the "nearest neighbor" problem is explored, and it is shown that under a broad set of conditions, as dimensionality increases, the Distance to the nearest data point approaches the distance to the farthest data point.
SEAL - Tying Up Information Integration and Web Site Management by Ontologies
TLDR
The SEAL conceptual architecture is described as well as its current implementation in KAON, a conceptual model that exploits ontologies for fulfilling the requirements set forth by community web sites at once.
Scaling Clustering Algorithms to Large Databases
TLDR
A scalable clustering framework applicable to a wide class of iterative clustering that requires at most one scan of the database and is instantiated and numerically justified with the popular K-Means clustering algorithm.
Finding Groups in Data: An Introduction to Cluster Analysis
An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train
...
1
2
...