Learn More
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable eeorts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of(More)
Thesis for the degree of Doctor of Philosophy to be presented with due permission for public examination and criticism in the Auditorium F1 of the Helsinki Abstract Kohonen's Self-Organizing Map (SOM) is one of the most popular artiicial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured(More)
We present Likey, a language-independent keyphrase extraction method based on statistical analysis and the use of a reference corpus. Likey has a very lightweight pre-processing phase and no parameters to be tuned. Thus, it is not restricted to any single language or language family. We test Likey having exactly the same configuration with 11 European(More)
Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-organizing map (SOM) algorithm. The document collection is(More)
New methods that are user-friendly and efficient are needed for guidanceamong the masses of textual information available in the Internet and theWorld Wide Web. We have developed a method and a tool called the WEBSOMwhich utilizes the self-organizing map algorithm (SOM) for organizing largecollections of text documents onto visual document maps. The(More)
|Formulation of suitable search expressions for information retrieval from large full-text databases may currently require considerable eeorts. Changing the scope of the search when, e.g., too many or too few hits have been obtained, requires re-formulation of the search expression. For an alternative scheme we suggest an explorative full-text information(More)
Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps(More)
Likey is an unsupervised statistical approach for keyphrase extraction. The method is language-independent and the only language-dependent component is the reference corpus with which the documents to be analyzed are compared. In this study, we have also used another language-dependent component: an English-specific Porter stemmer as a pre-processing step.(More)