• Corpus ID: 13909378

Flexible UIMA Components for Information Retrieval Research

  title={Flexible UIMA Components for Information Retrieval Research},
  author={Christof M{\"u}ller and Torsten Zesch and M. M{\"u}ller and Delphine Bernhard and Kateryna Ignatova and Iryna Gurevych and M. M{\"u}hlh{\"a}user},
In this paper, we present a suite of flexible UIMA-based components for information retrieval research which have been successfully used (and re-used) in several projects in different application domains. Implementing the whole system as UIMA components is beneficial for configuration management, component reuse, implementation costs, analysis and visualization. 
An architecture to support intelligent user interfaces for Wikis by means of Natural Language Processing
An architecture for integrating a set of Natural Language Processing (NLP) techniques with a wiki platform that entails support for adding, organizing, and finding content in the wiki and an intelligent interface to provide suggestions is presented.
Information Extraction with the Darmstadt Knowledge Processing Software Repository (Extended Abstract)
The DKPro repository consists of several main parts created to serve the purposes of different NLP application areas, including a highly flexible, scalable and easy-to-use toolkit that allows rapid creation of complex NLP pipelines for semantic information processi ng on demand.
Between Platform and APIs: Kachako API for Developers
Reusing an existing NLP platform Kachako, an API oriented NLP system is created that loosely couples rich high-end functions, including annotation visualizations, statistical evaluations, an-notation searching, etc.
Understanding the Information Needs of Web Archive Users
A complete characterization of web archive users must respond to three questions: why, what and how do users search? This study focuses on the first two: what are the user intents and which topics
An Approach For Evaluation Of Semantic Performance Of Search Engines: Google, Yahoo, Msn And Hakia
Comparison of web document retrieval performance and calculation of relative precision of the two sets of data show maximum relative precision for Hakia search engine followed by exchange places search engines of Yahoo and Google, where the lowest relevant precision is shown by Msn.
Combining Answers from heterogeneous Web Documents for Question Answering
The design and implementation of a question answering system that generates a summarized answer for open-domain natural language queries and aims to increase the quality of existing systems by using heterogeneous documents from Wikipedia, Yahoo! Answers and Frequently Asked Questions is described.
Information extraction for the geospatial domain
New approaches which were implemented as prototypes and evaluated for toponym recognition and the results showed that machine learning based classifiers perform well for resolving the geo/non-geo ambiguity.
DKPro-UGD: A Flexible Data-Cleansing Approach to Processing User-Generated Discourse
The five-stage data cleansing approach proposed here offers a maximum of flexibility in identifying problematic artifacts, deciding how to deal with them and analysing cleansed data and creating reusable UIMA-based components for the actual data cleansing and for mapping annotations created on the clean data back to the original representation.
Terminology Evolution Module for Web Archives in the LiWA Context∗
More and more national libraries and institutes are archiving the web as a part of the cultural heritage. As with all long term archives, these archives contain text and language that evolves over
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
It was found that semantic search performance of search engines was high for both keyword-based search engines and the semantic search engine, whereas Google turned-out to be the best search engine in terms of normalized recall ratio.


What to be? - Electronic Career Guidance Based on Semantic Relatedness
A study aimed at investigating the use of semantic information in a novel NLP application, Electronic Career Guidance (ECG), in German, and evaluating the performance of SR measures intrinsically on the tasks of computing SR, and solving Reader’s Digest Word Power questions.
Using the Structure of a Conceptual Network in Computing Semantic Relatedness
The method relies solely on the structure of a conceptual network and eliminates the need for performing additional corpus analysis and can be easily applied to compute semantic relatedness based on alternative conceptual networks, e.g. in the domain of life sciences.
Retrieval Models and Q and A Learning with FAQ Files
The issue of paraphrase recognition has been receiving attention in question-answering research as a way to fill the gap between words in a question and those in an answer, and the primary focus is on finding an FAQ question which is similar to the user query/question, that is, a Q-to-Q match.
Retrieving answers from frequently asked questions pages on the web
We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps: (1) fetching FAQ
Learning Question Paraphrases for QA from Encarta Logs
A method is proposed that exploits Encarta logs to automatically identify question paraphrases and extract templates, which can evidently outperform the unsupervised method.
Probabilistic part-of-speech tagging using decision trees
In this paper, a new probabilistic tagging method is presented which avoids problems that Markov Model based taggers face, when they have to estimate transition probabilities from sparse data. In