Learn More
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work has been to scale up the SOM(More)
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable eeorts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of(More)
Nonlinear dimensionality reduction methods are often used to visualize high-dimensional data, although the existing methods have been designed for other related tasks such as manifold learning. It has been difficult to assess the quality of visualizations since the task has not been well-defined. We give a rigorous definition for a specific visualization(More)
When the data vectors are high dimensional it is com putationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute simi larities or distances in the original data space It is therefore necessary to reduce the dimensionality before for example clustering the data If the dimensionality is very high like in the WEBSOM(More)
Finding structures in vast multidimensional data sets be they measurement data statistics or textual documents is di cult and time consuming Interesting novel relations between the data items may be hidden in the data The self organizing map SOM algorithm of Kohonen can be used to aid the exploration the structures in the data sets can be illustrated on(More)
The Adaptive-Subspace SOM (ASSOM) is a modular neural-network architecture, the modules of which learn to identify input patterns subject to some simple transformations. The learning process is unsupervised, competitive, and related to that of the traditional SOM (Self-Organizing Map). Each neural module becomes adaptively speciic to some restricted class(More)
In a visualization task, every nonlinear projection method needs to make a compromise between trustworthiness and continuity. In a trustworthy projection the visualized proximities hold in the original data as well, whereas a continuous projection visualizes all proximities of the original data. We show experimentally that one of the multidimensional(More)
Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and(More)