Shih-Hung Wu

Learn More
The integration of bibliographical information on scholarly publications available on the Internet is an important task in the academic community. Accurate reference metadata extraction from such publications is essential for the integration of metadata from heterogeneous reference sources. In this paper, we propose a hierarchical template-based reference(More)
The main drawback of previous Chinese character error detection systems is the high false alarm rate. To solve this problem, we propose a system that combines a statistic method and template matching to detect Chinese character errors. Error types include pronunciationrelated errors and form-related errors. Possible errors of a character can be collected to(More)
As new high-throughput technologies have created an explosion of biomedical literature, there arises a pressing need for automatic information extraction from the literature bank. To this end, biomedical named entity recognition (NER) from natural language text is indispensable. Current NER approaches include: dictionary based, rule based, or machine(More)
Text mining in the biomedical domain is receiving increasing attention. A key component of this process is named entity recognition (NER). Generally speaking, two annotated corpora, GENIA and GENETAG, are most frequently used for training and testing biomedical named entity recognition (Bio-NER) systems. JNLPBA and BioCreAtIvE are two major Bio-NER tasks(More)
Computer-assisted instruction systems have been broadly applied to help students solve math word problem. The majority of such systems, which are based on an instructor-initiating instruction strategy, provide pre-designed problems for the learners. When learners are asked to solve a word problem, the system will instruct the learners what to do. However,(More)
We analyze a collection of 3208 reported errors of Chinese words. Among these errors, 7.2% involved rarely used character, and 98.4% were assigned common classifications of their causes by human subjects. In particular, 80% of the errors observed in the writings of middle school students were related to the pronunciations and 30% were related to the(More)