Greg Kondrak

Learn More
We present a generative model for conversational dialogues, namely the actortopic model (ACTM), that extend the author-topic model (Rosen-Zvi, et.al, 2004) to identify actors of given conversation in literary narratives. Thus ACTM assigns each instance of quoted speech to an appropriate character. We model dialogues in a literary text, which take place(More)
Vast amount of literatures for biomedical research is available online, in MEDLINE database. This helps the biomedical scientists to have instant access to literatures and references they need. But finding a manageable subset of literatures that are relevant to their current research is hard because: (1) the number of these articles are growing very fast ,(More)
The field of molecular biology is growing at an astounding rate and research findings are being deposited into public databases, such as Swiss-Prot. Many of the over 200,000 protein entries in Swiss-Prot 49.1 lack annotations such as subcellular localization or function, but the vast majority have references to journal abstracts describing related research.(More)
This paper presents a fully automated linguistic approach to measuring distance between phonemes across languages. In this approach, a phoneme is represented by a feature matrix where feature categories are fixed, hierarchically related and binary-valued; feature categorization explicitly addresses allophonic variation and feature values are weighted based(More)
Several approaches have been proposed for the automatic acquisition of multiword expressions from corpora. However, there is no agreement about which of them presents the best cost-benefit ratio, as they have been evaluated on distinct datasets and/or languages. To address this issue, we investigate these techniques analysing the following dimensions:(More)
  • 1