Learn More
Community Question Answering (CQA) websites provide a rapidly growing source of information in many areas. This rapid growth, while offering new opportunities, puts forward new challenges. In most CQA implementations there is little effort in directing new questions to the right group of experts. This means that experts are not provided with questions(More)
We present a generative model for simultaneously clustering documents and terms. Our model is a four-level hierarchical Bayesian model, in which each document is modeled as a random mixture of document topics , where each topic is a distribution over some segments of the text. Each of these segments in the document can be modeled as a mixture of word topics(More)
This paper presents a statistical model for discovering topical clusters of words in unstructured text. The model uses a hierarchical Bayesian structure and it is also able to identify segments of text which are topically coherent. The model is able to assign each segment to a particular topic and thus categorizes the corresponding document to potentially(More)
Increasingly large text damsels and the high dimensionality associated with natural language create a great challenge in text mining, In this research, a systematic study is conducted. in which three different document representation methods for text are used, together with three Dimension Reduction Techniques (DRT), in the context of the text clustering(More)
4 Stackoverflow  Answer topics influenced by question topics  Answer topics more technical and specific  Answers may contain additional topics that are correlated with question topics Abstract Community Question Answering (CQA) services contain large archives of previously asked questions and their answers. We present a statistical topic model for(More)
This paper discusses the UNL Enconversion of Tamil sentences. The rich morphology of Tamil enables the Enconversion process to be based on morpho-semantic features of the words and their preceding and succeeding context. The use of case relation indicating morphological suffixes, POS tag and word level semantics allows the rule based Enconversion to be
Increasingly large text datasets and the high dimensionality associated with natural language is a great challenge of text mining. In this research, a systematic study is conducted of application of three Dimension Reduction Techniques (DRT) on three different document representation methods in the context of the text clustering problem using several(More)
  • 1