Lars Yencken

Learn More
Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical,(More)
State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon(More)
As human beings, our mental processes for recognising linguistic symbols generate perceptual neighbourhoods around such symbols where confusion errors occur. Such neighbourhoods also provide us with conscious mental associations between symbols. This paper formalises orthographic models for similarity of Japanese kanji, and provides a proofof-concept(More)
Finding an unknown Japanese word in a dictionary is a difficult and slow task when one or more of the word’s characters is unknown. For advanced learners, unknown characters evoke the form and meaning of visually similar characters they are familiar with. We propose a range of character distance metrics to allow learners to leverage known characters to(More)
Learning a foreign language is a long, error-prone process, and much of a learner’s time is effectively spent studying vocabulary. Many errors occur because words are only partly known, and thismakes theirmental storage and retrieval problematic. This paper describes how an intelligent interface may take advantage of the access structure of the mental(More)
Finding an unknown Japanese word in a dictionary is a difficult and slow task when one or more of the word’s characters is unknown. For advanced learners, unknown characters evoke the form and meaning of visually similar characters they are familiar with. We propose a range of distance metrics for characters to allow learners to leverage known characters to(More)
In this paper we explore the results of a large-scale online game called 'the Great Language Game', in which people listen to an audio speech sample and make a forced-choice guess about the identity of the language from 2 or more alternatives. The data include 15 million guesses from 400 audio recordings of 78 languages. We investigate which languages are(More)