Learn More
In this paper, we propose a new application of Bayesian language model based on Pitman-Yor process for information retrieval. This model is a generalization of the Dirichlet distribution. The Pitman-Yor process creates a power-law distribution which is one of the statistical properties of word frequency in natural language. Our experiments on Robust04(More)
A well-known challenge of information retrieval is how to infer a user's underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core(More)
In this paper, we discuss an essential component for classifying opinionative and factual sentences in an opinion question answering system. We propose a language model-based approach with a Bayes classifier. This classification model is used to filter sentence retrieval outputs in order to answer opinionative questions. We used Subjectivity dataset for our(More)
In this paper we propose a term clustering approach to improve the performance of sentence retrieval in Question Answering (QA) systems. As the search in question answering is conducted over smaller segments of data than in a document retrieval task, the problems of data sparsity and exact matching become more critical. In this paper we propose Language(More)
For the slot filling task of TAC KBP 2010 we developed as a system a simple pipeline architecture whose main components are a two-stage retrieval module and a relation extraction module. We use word-cluster features in the system as a method of achieving generalization by exploiting raw text. In the relation extraction module we use distant supervision in(More)
In this paper, we propose two different language modeling approaches , namely skip trigram and across sentence boundary, to capture the long range dependencies. The skip trigram model is able to cover more predecessor words of the present word compared to the normal trigram while the same memory space is required. The across sentence boundary model uses the(More)
Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian are discussed. The sources(More)
—The prediction task in national language processing means to guess the missing letter, word, phrase, or sentence that likely follow in a given segment of a text. Since 1980s many systems with different methods were developed for different languages. In this paper an overview of the existing prediction methods that have been used for more than two decades(More)