• Publications
  • Influence
Text mining for product attribute extraction
TLDR
We describe our work on extracting attribute and value pairs from textual product descriptions. Expand
  • 185
  • 12
  • PDF
MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules
TLDR
This paper describes the run-time transfer-based machine translation system as well as two of the pre-run-time modules: elicitation of data from the minority language and automated learning of transfer rules. Expand
  • 54
  • 5
  • PDF
Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions
TLDR
We describe an approach to extract attribute-value pairs from product descriptions that requires very little user supervision and show experimental results on a web catalog of sporting goods. Expand
  • 72
  • 4
  • PDF
Enhancing foreign language tutors - In search of the golden speaker
TLDR
We use the Fluency system [Proceedings of Speech Technology in Language and Learning, 1998, p. 77] to answer the question of what voice a language learner should imitate when working on pronunciation. Expand
  • 45
  • 3
Automatic Rule Learning for Resource-Limited MT
TLDR
This paper focuses on a machine learning approach to transfer-based MT, where data in the form of translations and lexical alignments are elicited from bilingual speakers and a seeded version-space learning algorithm formulates and refines transfer rules. Expand
  • 36
  • 3
  • PDF
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System
TLDR
We describe the rapid development of a preliminary Hebrew-to-English Machine Translation system under a transfer-based framework specifically designed for rapid MT prototyping for languages with limited linguistic resources. Expand
  • 33
  • 2
  • PDF
Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web
TLDR
We describe an approach to extract attribute-value pairs from product descriptions in order to augment product databases by representing each product as a set of attribute- value pairs. Expand
  • 11
  • 1
  • PDF
Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages
NICE is a machine translation project for low-density languages. We are building a tool that will elicit a controlled corpus from a bilingual speaker who is not an expert in linguistics. The corpusExpand
  • 31
  • 1
  • PDF
A trainable transfer-based MT approach for languages with limited resources
TLDR
We describe the general principles underlying our approach, and present results from an experiment, where we developed a basic Hindi-to-English MT system over the course of two months, using extremely limited resources. Expand
  • 31
  • 1
  • PDF
Maximizing Privacy under Data Distortion Constraints in Noise Perturbation Methods
TLDR
This paper introduces the `guessing anonymity,' a definition of privacy for noise perturbation methods that captures the difficulty of linking identity to a sanitized record using publicly available information. Expand
  • 13
  • 1
  • PDF