Samuel W. K. Chan

Learn More
—We describe a comprehensive framework for text understanding , based on the representation of context. It is designed to serve as a representation of semantics for the full range of in-terpretive and inferential needs of general natural language processing. Its most distinctive feature is its uniform representation of the various simple and independent(More)
Natural language understanding involves the simultaneous consideration of a large number of different sources of information. Traditional methods employed in language analysis have focused on developing powerful formalisms to represent syntactic or semantic structures along with rules for transforming language into these formalisms. However, they make use(More)
Discourse markers foreshadow the message thrust of texts and saliently guide their rhetorical structure which are important for content filtering and text abstraction. This paper reports on efforts to automatically identify and classify discourse markers in Chinese texts using heuristic-based and corpus-based data-mining methods, as an integral part of(More)
While the breath of vocabulary used in long documents may mislead the traditional keyword-based retrieval systems, the demands for techniques in nontextual Web classification and retrieval from a large document collection are mounting. Only a few prototype systems have attempted to classify hypertext on the basis of nontextual elements in order to locate(More)
This paper proposes a novel approach to automatic text segmentation without a full semantic understanding. In order to analyse the linguistic bonds and determine the degree of coherence that a text may exhibit, the tremendous diversity of textual relations in a discourse network is represented. A corpus of mutual linguistic knowledge that captures the(More)
As unlexicalized parsing lacks word token information, it is important to investigate novel parsing features to improve the accuracy. This paper studies a set of tree topological (TT) features. They quantitatively describe the tree shape dominated by each non-terminal node. The features are useful in capturing linguistic notions such as grammatical weight(More)