Zoltán Szamonek

Learn More
When data is scarce or the alphabet is large, smoothing the probability estimates becomes inescapable when estimating n-gram models. In this paper we propose a method that implements a form of smoothing by exploiting similarity information of the alphabet elements. The idea is to view the log-conditional probability function as a smooth function defined(More)
In this paper we consider sequence clustering problems and propose an algorithm for the estimation of the number of clusters based on the X-means algorithm. The sequences are modeled using mixtures of Hidden Markov Models. By means of experiments with synthetic data we analyze the proposed algorithm. This algorithm proved to be both com-putationally(More)
  • 1