• Corpus ID: 212633947

Automatic Machine Learning Derived from Scholarly Big Data

  title={Automatic Machine Learning Derived from Scholarly Big Data},
  author={Asnat Greenstein-Messica and Roman Vainshtein and Gilad Katz and Bracha Shapira and Lior Rokach},
One of the challenging aspects of applying machine learning is the need to identify the algorithms that will perform best for a given dataset. This process can be difficult, time consuming and often requires a great deal of domain knowledge. We present Sommelier, an expert system for recommending the machine learning algorithms that should be applied on a previously unseen dataset. Sommelier is based on word embedding representations of the domain knowledge extracted from a large corpus of… 

Figures and Tables from this paper



Detecting Target Text Related to Algorithmic Efficiency in Scholarly Big Data Using Recurrent Convolutional Neural Network Model

A set of algorithms that extract information pertaining to the performance of algorithm(s) presented and/or discussed in the research article are proposed, using the Recurrent Convolutional Neural Network (RCNN) model.

Improving pseudo-code detection in ubiquitous scholarly data using ensemble machine learning

  • Suppawong Tuarob
  • Computer Science
    2016 International Computer Science and Engineering Conference (ICSEC)
  • 2016
Investigating possible enhancement on the previously proposed classification methodology using ensemble learning techniques shows that Random Forest is by far the most effective ensemble learning method which improves the classification performance by 13% over the best base classifier.

Efficient and Robust Automated Machine Learning

This work introduces a robust new AutoML system based on scikit-learn, which improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization.

Automatic Knowledge Base Construction from Scholarly Documents

A fully automatic, unsupervised system for scientific information extraction that does not build on an existing knowledge base and avoids manually-tagged training data is proposed and a constructed taxonomy is described that contains over 15k entities resulting from applying the approach to 10k documents.

A Data Mining Ontology for Algorithm Selection and Meta-Mining

The immediate goal is to build a data mining ontology formalizing the key components that together compose an algorithm's inductive bias, so that a meta-learner could infer algorithm selection guidelines by correlating an algorithms's intrinsic bias with empirical evidence of its performance.

AlgorithmSeer: A System for Extracting and Searching for Algorithms in Scholarly Big Data

A novel set of scalable techniques used by AlgorithmSeer to identify and extract algorithm representations in a heterogeneous pool of scholarly documents are proposed and hybrid machine learning approaches are proposed to discover algorithm representations.

Recurrent Convolutional Neural Networks for Text Classification

A recurrent convolutional neural network is introduced for text classification without human-designed features to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks.

Wikipedia-based query performance prediction

This work proposes a corpus-independent approach to pre-retrieval prediction which relies on information extracted from Wikipedia and presents Wikipedia-based features that can attest to the effectiveness of retrieval performed in response to a query regardless of the corpus upon which search is performed.

Discovering Relations between Noun Categories

This work proposes an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories, and concludes this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL.

Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

This work considers the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately and shows classification performance often much better than using standard selection and hyperparameter optimization methods.