Dong-Hong Ji

Learn More
This paper proposes a tree kernel with contextsensitive structured parse tree information for relation extraction. It resolves two critical problems in previous tree kernels for relation extraction in two ways. First, it automatically determines a dynamic context-sensitive tree span for relation extraction by extending the widely-used Shortest Path-enclosed(More)
Shortage of manually labeled data is an obstacle to supervised relation extraction methods. In this paper we investigate a graph based semi-supervised learning algorithm, a label propagation (LP) algorithm, for relation extraction. It represents labeled and unlabeled examples and their distances as the nodes and the weights of edges of a graph, and tries to(More)
This paper describes the implementation of our three systems at SemEval-2007, for task 2 (word sense discrimination), task 5 (Chinese word sense disambiguation), and the first subtask in task 17 (English word sense disambiguation). For task 2, we applied a cluster validation method to estimate the number of senses of a target word in untagged data, and then(More)
Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation methods. In this paper we investigate a label propagation based semisupervised learning algorithm for WSD, which combines labeled and unlabeled data in learning process to fully realize a global consistency assumption: similar examples should have similar labels.(More)
This paper presents an overview of the IR4QA (Information Retrieval for Question Answering) Task of the NTCIR-7 ACLIA (Advanced Cross-lingual Information Access) Task Cluster. IR4QA evaluates traditional ranked retrieval of documents using wellstudied metrics such as Average Precision, but the retrieval task is embedded in the context of cross-lingual(More)
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of(More)
This paper proposes a novel document re-ranking approach in information retrieval, which is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo(More)
The work described in this paper was originally motivated by the need to map verbs associated with FrameNet 2.0 frames to appropriate WordNet 2.0 senses. As the work evolved, it became apparent that the developed method was applicable for a number of other tasks, including assignment of WordNet senses to word lists used in attitude and opinion analysis, and(More)
Automatic summarization is an important research issue in natural language processing. This paper presents a special summarization method to generate single-document summary with maximum topic completeness and minimum redundancy. It initially implements the semantic-class-based vector representations of various kinds of linguistic units in a document by(More)