Making Travel Smarter: Extracting Travel Information From Email Itineraries Using Named Entity Recognition

  title={Making Travel Smarter: Extracting Travel Information From Email Itineraries Using Named Entity Recognition},
  author={Divyansh Kaushik and Shashank Gupta and Chakradhar Raju and Reuben Dias and Sanjib Ghosh},
The purpose of this research is to address the problem of extracting information from travel itineraries and discuss the challenges faced in the process. Business-to-customer emails like booking confirmations and e-tickets are usually machine generated by filling slots in pre-defined templates which improve the presentation of such emails but also make the emails more complex in structure. Extracting the relevant information from these emails would let users track their journeys and important… Expand


Annotating Needles in the Haystack without Looking: Product Information Extraction from Emails
This paper introduces a system which can extract structured information automatically without requiring human review of any personal content, and proposes a hybrid approach, which basically trains a CRF model using the labels predicted by binary classifiers (weak learners). Expand
Named Entity Recognition using an HMM-based Chunk Tagger
A Hidden Markov Model and an HMM-based chunk tagger is proposed, from which a named entity (NE) recognition system is built to recognize and classify names, times and numerical quantities, and the NER problem can be resolved effectively. Expand
Open Information Extraction from the Web
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced. Expand
Adding Semantics to Email Clustering
A novel unsupervised approach is put forward which treats GSPs as pseudo class labels and conduct email clustering in a supervised manner, although no human labeling is involved, which is expected to improve the clustering performance. Expand
Contextual search and name disambiguation in email using graphs
This paper provides a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph and shows that reranking schemes based on the graph-walk similarity measures often outperform baseline methods and that further improvements can be obtained by use of appropriate learning methods. Expand
Motivating Intelligent E-mail in Business: An Investigation into Current Trends for E-mail Processing and Communication Research
This paper surveys the current state of the art in email processing and communication research, focusing on the current and potential roles played by email in information management, and commercial and research efforts to integrate a semantic-based approach to email. Expand
Locating Complex Named Entities in Web Text
This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text and shows that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. Expand
The Enron Corpus: A New Dataset for Email Classification Research
The Enron corpus is introduced as a new test bed for email folder prediction, and the baseline results of a state-of-the-art classifier (Support Vector Machines) are provided under various conditions. Expand
Inferring Ongoing Activities of Workstation Users by Clustering Email
A variety of unsupervised clustering methods designed for clustering emails by user activity are described, and the use of information extractors and pretrained classifiers to infer additional information about each discovered cluster are described. Expand
Automatically classifying emails into activities
Several algorithms for automatically recognizing emails as part of an ongoing activity are presented and it is shown that a combined approach that votes the predictions of the individual methods performs better than each individual method alone. Expand