Data Set Used
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable eeorts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of… (More)
| The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automatically. In the report, we introduce the WEBSOM method for this… (More)
Thesis for the degree of Doctor of Philosophy to be presented with due permission for public examination and criticism in the Auditorium F1 of the Helsinki Abstract Kohonen's Self-Organizing Map (SOM) is one of the most popular artiicial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured… (More)
Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-organizing map (SOM) algorithm. The document collection is… (More)
|Formulation of suitable search expressions for information retrieval from large full-text databases may currently require considerable eeorts. Changing the scope of the search when, e.g., too many or too few hits have been obtained, requires re-formulation of the search expression. For an alternative scheme we suggest an explorative full-text information… (More)
We present Likey, a language-independent keyphrase extraction method based on statistical analysis and the use of a reference corpus. Likey has a very lightweight pre-processing phase and no parameters to be tuned. Thus, it is not restricted to any single language or language family. We test Likey having exactly the same configuration with 11 European… (More)
We study how independent component analysis can be used to create automatically syntactic and semantic features based on analyzing words in contexts.