A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry

  title={A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry},
  author={Richi Nayak and Thirunavukarasu Balasubramaniam and Sangeetha Kutty and Sachindra Banduthilaka and Erin Peterson},
With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying precise answers to a given query is often a challenging task especially if the data source where the relevant information resides is unknown. This situation becomes more complex when the data source is available in multiple formats such as PDF, table and… 


Table extraction for answer retrieval
To retrieve answers, the approach creates a cell document, which contains the cell and its metadata (headers, titles) for each table cell, and the retrieval model ranks the cells of the extracted tables using a language-modeling approach.
Information Extraction: Techniques and Challenges
This volume takes a broad view of information extraction as any method for ltering information from large volumes of text. This includes the retrieval of documents from collections and the tagging of
Extraction and exploration of spatio-temporal information in documents
This paper shows how co-occurrences of temporal and geographic information extracted from documents are determined and spatio-temporal document profiles are computed and provides the basis for a variety of document search and exploration tasks, such as visualizing the sequences of events on a map.
Mining modern repositories with elasticsearch
This paper reflects upon its own experience with Elasticsearch and highlights its strengths and weaknesses for performing modern mining software repositories research.
Data Mining with Big Data
  • S. R, S. R
  • Computer Science
    2017 11th International Conference on Intelligent Systems and Control (ISCO)
  • 2017
This paper proposes a framework on recent research for the Data Mining using Big Data, based on a strong body of work in data integration, mapping and transformations, to achieve automated error-free difference resolution.
Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling
By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
The CoNLL-2003 shared task: language-independent named entity recognition is described and a general overview of the systems that have taken part in the task and discuss their performance is presented.
Big Data: A Revolution That Will Transform How We Live, Work, and Think
Since Aristotle, we have fought to understand the causes behind everything. But this ideology is fading. The world of big data can crunch However the indirect implication of a raw material in cdc
Information Retrieval and Text Mining Technologies for Chemistry.
This Review provides a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting information demands of chemical information contained in scientific literature, patents, technical reports, or the web.