Martin Scaiano

Learn More
In internet advertising, negative key phrases are used in order to exclude the display of an advertisement to non-target audience. We describe a method for automatically identifying negative key phrases. We use Wikipedia as our sense inventory and as an annotated corpus from which we create context vectors and determine negative phrases, which correlate(More)
We submitted runs from two different systems for the update summary task at TAC 2009. The first system refined its use of Roget's Thesaurus, moving beyond 2008's semantic relatedness to compute an entropy-based uniqueness measure, with improved results in summary construction. The other system, our first use of deeper semantic knowledge, represents(More)
To improve information retrieval from films we attempt to segment movies into scenes using the subtitles. Film subtitles differ significantly in nature from other texts; we describe some of the challenges of working with movie subtitles. We test a few modifications to the TextTiling algorithm, in order to get an effective segmentation.
The development of systems that extract a frame representation of text can lead to deeper semantics being used in natural language processing. We present the development of our system for extracting frames from text. Our system is trained on the FrameNet data and tested on the SemEval 2007: Task 19 Frame Extraction Task data. We use machine learning for(More)
OBJECTIVES It has become regular practice to de-identify unstructured medical text for use in research using automatic methods, the goal of which is to remove patient identifying information to minimize re-identification risk. The metrics commonly used to determine if these systems are performing well do not accurately reflect the risk of a patient being(More)
  • 1