Learn More
s In TREC-10, we participated in the web track (only ad-hoc task) and the QA track (only main task). In the QA track, our QA system (SiteQ) has general architecture with three processing steps: question processing, passage selection and answer processing. The key technique is LSP's (Lexico-Semantic Patterns) that are composed of linguistic entries and(More)
A complete framework for enumerating and classifying the types of multidatabase system (MDBS) structural and representational discrepancies is developed. The framework is structured according to a relational database schema and is both practical and complete. It was used to build the UniSQL/M commercial multidatabase system. This MDBS was built over(More)
In order to efficiently develop large-scale and complicated software, it is important for system engineers to correctly understand users' requirements. Most requirements in large-scale projects are collected from various stakeholders located in various regions, and they are generally written in natural language. Therefore, the initial collected requirements(More)
A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large(More)
Automatic text categorization is a problem of assigning text documents to pre-defined categories. In order to classify text documents, we must extract useful features. In previous researches, a text document is commonly represented by the term frequency and the inverted document frequency of each feature. Since there is a difference between important(More)
This paper proposes an effective method to extract salient sentences using contextual information and statistical approaches for Text Summarization. The proposed method combines two consecutive sentences into a bi-gram pseudo sentence so that contextual information is applied to statistical sentence-extraction techniques. Salient bi-gram pseudo sentences(More)
In this paper, we present a new method of representing the Surface syntactic structure of a sentence. Trees have usually been used in linguistics and natural language processing to represent syntactic structures of a sentence. A tree structure shows only one possible syntactic parse of a sentence, but in order to choose a correct parse, we need to examine(More)
This paper proposes a new approach for text categorization, based on a feature projection technique. In our approach, training data are represented as the projections of training documents on each feature. The voting for a classification is processed on the basis of individual feature projections. The final classification of test documents is determined by(More)
We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from a discourse tagged corpus to resolve ambiguities. We propose the idea of tagging discourse segment boundaries to represent the structural information(More)