Text classification and named entities for new event detection

  title={Text classification and named entities for new event detection},
  author={Giridhar Kumaran and James Allan},
  booktitle={Annual International ACM SIGIR Conference on Research and Development in Information Retrieval},
New Event Detection is a challenging task that still offers scope for great improvement after years of effort. In this paper we show how performance on New Event Detection (NED) can be improved by the use of text classification techniques as well as by using named entities in a new way. We explore modifications to the document representation in a vector space-based NED system. We also show that addressing named entities preferentially is useful only in certain situations. A combination of all… 

Classification Models for New Event Detection

This paper explores the application of machine learning classification techniques for new event detection, and introduces the concept of triangulation with illustrative examples, and develops new features that build on this concept, and the named entities present in a document.

An Improved New Event Detection Model

The application of weighting the part of speech and generates document theme terms based on the document named entity to detect new event, which can improve performance comparing with the traditional model.

Using Names and Topics for New Event Detection

Two stories are compared by finding three cosine similarities based on names, topics and the full text to suggest treating the NED problem as a binary classification problem with the comparison scores serving as features.

New event detection based on indexing-tree and named entity

A new NED model to speed up the NED task by using news indexing-tree dynamically and using statistics on training data to learn the named entity reweighting model for each class of stories is proposed.

Combining named entities and tags for novel sentence detection

This research applies named entity recognition (NER) and part-of-speech (POS) tagging on sentence-level novelty detection and proposes a mixed method to utilize these two techniques.

Detecting New and Emerging Events from Textual Sources

This work argues a NED method must extract and represent the type of event and its participants as well as the temporal and spatial properties of the event to be successful at this task.

Named Entities as New Features for Czech Document Classification

The main goal of this work is to propose new features based on the Named Entities NEs for this task, but it is shown that these features do not improve significantly the score over the baseline word-based features.

A New Event Detection Model Based on Term Reweighting

Experimental results on two linguistic data consortium (LDC) data sets: TDT2 and TDT3 show that both the proposed approaches can effectively improve the performance of NED task, compared to the baseline method and existing methods.

A Model for Anticipatory Event Detection

The Anticipatory Event Detection (AED) problem is introduced: given some user preferred event transition in a topic, detect the occurence of the transition for the stream of news covering the topic.

Named entity patterns across news domains

A better understanding on NE patterns is achieved by identifying the distribution of NE across news domains and a prototype event tracking system based onNE patterns is designed.



A System for new event detection

A new method and system for performing the New Event Detection task, i.e., in one or multiple streams of news stories, all stories on a previously unseen (new) event are marked, based on an incremental TF-IDF model is presented.

Topic-conditioned novelty detection

This paper proposes a new approach which addresses this problem in two stages: using a supervised learning algorithm to classify the on-line document stream into pre-defined broad topic categories, and performing topic-conditioned novelty detection for documents in each topic.

On-Line New Event Detection using Single Pass Clustering

An evaluation methodology is developed based on a combination of techniques that allows us to infer the expected performance of the approach in the field, and to suggest avenues for future research that may lead to better performance.

Topic detection and tracking: event-based information organization

This collection of technical papers from leading researchers in the field not only provides several chapters devoted to the research program and its evaluation paradigm, but also presents the most current research results and describes some of the remaining open challenges.

BoosTexter: A Boosting-based System for Text Categorization

This work describes in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks, and presents results comparing the performance of Boos Texter and a number of other text-categorization algorithms on a variety of tasks.

An Algorithm that Learns What's in a Name

IdentiFinderTM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities, is evaluated and is competitive with approaches based on handcrafted rules on mixed case text and superior on text where case information is not available.

The INQUERY Retrieval System

A retrieval system (INQUERY) that is based on a probabilistic retrieval model and provides support for sophisticated indexing and complex query formulation is described.

Viewing morphology as an inference process

The role of morphological analysis in word sense disambiguation, and in identifying lexical semantic relationships in a machine-readable dictionary, is described.

First story detection in TDT is hard

✁✄✂✆☎✞✝✠✟☛✡✌☞✞✟☛✟✎✍✑✏✓✒✄✔✕✒ ✗✖✘✝✠✡✚✙✛✂✜✍ ✢✂✜✡✜✍✢✝✠✒✤✣✦✥✗✣✧☎★✔✪✩✫✥✬✡✮✭✯✝✠✣✘✰★✱✲✔ ✳✙✴✔✶✵ ✂✌✷✸✂✜✣✯✍☛✹✻✺✘✥✗✟☛✂ ✼☎✽✝✾✣❀✿❁✒✬✩✢❂❃✥✬✍✢✝✠✒✤✣❄✒✗✩✢✰✤✥✗✣✞✝✾❅✌✥✬✍✢✝✠✒✤✣ ❆✍✫✥✗✟☛✭✯✟✌❇❈✍☛✩✫✥✗✡✫✭❀✝✠✣✞✰ ✥✗✣✘☎ ❊❉✞✩✢✟