Learn More
s In TREC-10, we participated in the web track (only ad-hoc task) and the QA track (only main task). In the QA track, our QA system (SiteQ) has general architecture with three processing steps: question processing, passage selection and answer processing. The key technique is LSP's (Lexico-Semantic Patterns) that are composed of linguistic entries and(More)
A complete framework for enumerating and classifying the types of multidatabase system (MDBS) structural and representational discrepancies is developed. The framework is structured according to a relational database schema and is both practical and complete. It was used to build the UniSQL/M commercial multidatabase system. This MDBS was built over(More)
In order to efficiently develop large-scale and complicated software, it is important for system engineers to correctly understand users' requirements. Most requirements in large-scale projects are collected from various stakeholders located in various regions, and they are generally written in natural language. Therefore, the initial collected requirements(More)
A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large(More)
Automatic text categorization is a problem of assigning text documents to pre-defined categories. In order to classify text documents, we must extract useful features. In previous researches, a text document is commonly represented by the term frequency and the inverted document frequency of each feature. Since there is a difference between important(More)
In this paper, we present a new method of representing the Surface syntactic structure of a sentence. Trees have usually been used in linguistics and natural language processing to represent syntactic structures of a sentence. A tree structure shows only one possible syntactic parse of a sentence, but in order to choose a correct parse, we need to examine(More)
This paper proposes an effective method to extract salient sentences using contextual information and statistical approaches for Text Summarization. The proposed method combines two consecutive sentences into a bi-gram pseudo sentence so that contextual information is applied to statistical sentence-extraction techniques. Salient bi-gram pseudo sentences(More)
This paper proposes a new approach for text categorization, based on a feature projection technique. In our approach, training data are represented as the projections of training documents on each feature. The voting for a classification is processed on the basis of individual feature projections. The final classification of test documents is determined by(More)
We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from a discourse tagged corpus to resolve ambiguities. We propose the idea of tagging discourse segment boundaries to represent the structural information(More)