Charles Jochim

Learn More
Patent retrieval is a branch of Information Retrieval (IR) aiming to support patent professionals in retrieving patents that satisfy their information needs. Often, patent granting bodies require patents to be partially translated into one or more major foreign languages, so that language boundaries do not hinder their accessibility. This multilinguality of(More)
More intellectual property information is generated now than ever before. The accumulation of intellectual property data, further complicated by this continued increase in production, makes it imperative to develop better methods for archiving and more importantly for accessing this information. Information retrieval (IR) is a standard technique used for(More)
Citations are a valuable resource for characterizing scientific publications that has already been used in applications such as summarization and information retrieval. These applications could be even better served by expanding citation information. We aim to achieve this by extracting and classifying citation information from the text, so that subsequent(More)
Methoden und Techniken zur automatischen Verarbeitung und inhaltlichen Erfassung großer Mengen an Textdokumenten haben in den vergangenen Jahren enorm an Bedeutung gewonnen. Während einerseits die Verfügbarkeit und der Zugang zu digitalisierten Textdokumenten bis dato in ungeahntem Maße gestiegen sind, erweist sich die Erfassung des semantischen Inhalts(More)
In many fields of NLP, supervised machine learning methods reach the best performance results. Apart from creating new classification models, there are two possibilities to improve classification performance: (i) improve the comprehensiveness of feature representations of linguistic instances, and (ii) improve the quality of the training gold standard.(More)
This paper investigates how to extract probability statements from academic medical papers. In previous work we have explored traditional classification methods which led to numerous false negatives. This current work focuses on constraining classification output obtained from a Conditional Random Field (CRF) model to allow for domain knowledge constraints.(More)
Stance classification is a core component in on-demand argument construction pipelines. Previous work on claim stance classification relied on background knowledge such as manually-composed sentiment lexicons. We show that both accuracy and coverage can be significantly improved through automatic expansion of the initial lexicon. We also developed a set of(More)
Building on the use of local contexts, or frames, for human category acquisition, we explore the treatment of contexts as categories. This allows us to examine and evaluate the categorical properties that local unsupervised methods can distinguish and their relationship to corpus POS tags. From there, we use lexical information to combine contexts in a way(More)