Learn More
In this paper, we propose a new application of Bayesian language model based on Pitman-Yor process for information retrieval. This model is a generalization of the Dirichlet distribution. The Pitman-Yor process creates a power-law distribution which is one of the statistical properties of word frequency in natural language. Our experiments on Robust04(More)
A well-known challenge of information retrieval is how to infer a user's underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core(More)
In this paper, we discuss an essential component for classifying opinionative and factual sentences in an opinion question answering system. We propose a language model-based approach with a Bayes classifier. This classification model is used to filter sentence retrieval outputs in order to answer opinionative questions. We used Subjectivity dataset for our(More)
In this paper we propose a term clustering approach to improve the performance of sentence retrieval in Question Answering (QA) systems. As the search in question answering is conducted over smaller segments of data than in a document retrieval task, the problems of data sparsity and exact matching become more critical. In this paper we propose Language(More)
For the slot filling task of TAC KBP 2010 we developed as a system a simple pipeline architecture whose main components are a two-stage retrieval module and a relation extraction module. We use word-cluster features in the system as a method of achieving generalization by exploiting raw text. In the relation extraction module we use distant supervision in(More)
In this paper, we propose two different language modeling approaches , namely skip trigram and across sentence boundary, to capture the long range dependencies. The skip trigram model is able to cover more predecessor words of the present word compared to the normal trigram while the same memory space is required. The across sentence boundary model uses the(More)
In this paper a new method for automatic word clustering is presented. We used this method for building n-gram language models for Persian continuous speech recognition (CSR) systems. In this method, each word is specified by a feature vector that represents the statistics of parts of speech (POS) of that word. The feature vectors are clustered by k-means(More)
Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian are discussed. The sources(More)
An stochastic version of the classical shortest path problem whereby for each node of a graph, a probability distribution over the set of successor nodes must be chosen so as to reach a certain destination node with minimum expected cost. In this paper, we propose a new algorithm based on Particle Swarm Optimization (PSO) for solving Stochastic Shortest(More)