Learn More
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable eeorts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of(More)
Thesis for the degree of Doctor of Philosophy to be presented with due permission for public examination and criticism in the Auditorium F1 of the Helsinki Abstract Kohonen's Self-Organizing Map (SOM) is one of the most popular artiicial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured(More)
Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-organizing map (SOM) algorithm. The document collection is(More)
| The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automatically. In the report, we introduce the WEBSOM method for this(More)
We present Likey, a language-independent keyphrase extraction method based on statistical analysis and the use of a reference corpus. Likey has a very lightweight pre-processing phase and no parameters to be tuned. Thus, it is not restricted to any single language or language family. We test Likey having exactly the same configuration with 11 European(More)
We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis(More)