• Publications
  • Influence
Thumbs up? Sentiment Classification using Machine Learning Techniques
TLDR
This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
SystemML: Declarative machine learning on MapReduce
TLDR
This paper proposes SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment and describes and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source mapReduce implementation.
SystemT: An Algebraic Approach to Declarative Information Extraction
TLDR
A rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars, SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules.
Regular Expression Learning for Information Extraction
TLDR
It is shown that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data and how the accuracy of CRF can be improved by using features extracted by ReLie.
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling
TLDR
This paper presents a two-stage method to enable the construction of SRL models for resourcepoor languages by exploiting monolingual SRL and multilingual parallel data and shows that this method outperforms existing methods.
SystemT: a system for declarative information extraction
TLDR
The extraction algebra is described and the effectiveness of the optimization techniques in providing orders of magnitude reduction in the running time of complex extraction tasks are demonstrated.
OLAP over uncertain and imprecise data
TLDR
This is the first paper to handle both imprecision and uncertainty in an OLAP setting and identify three natural query properties and use them to shed light on alternative query semantics.
Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks
TLDR
A high-level language NERL is tuned to the needs of NER tasks and simplifies the process of building, understanding, and customizing complex rule-based named-entity annotators and shows that these customized annotators match or outperform the best published results achieved with machine learning techniques.
An Algebraic Approach to Rule-Based Information Extraction
TLDR
This work proposes an algebraic approach to rule-based IE that addresses scalability issues through query optimization and presents the operators of this algebra and proposes several optimization strategies motivated by the text-specific characteristics of the operators.
Exploiting clustering and phrases for context-based information retrieval
TLDR
It is argued that the focused relevance feedback provided by contexts, at a level of abstraction higher than individual documents and lower than the database as a whole, provides a natural way for users to refine vague information needs and helps to blur the distinction between searching and browsing.
...
1
2
3
4
5
...