• Corpus ID: 13463143

Retrieval of Experiments by Efficient Estimation of Marginal Likelihood

  title={Retrieval of Experiments by Efficient Estimation of Marginal Likelihood},
  author={Sohan Seth and John Shawe-Taylor and Samuel Kaski},
We study the task of retrieving relevant experiments given a query experiment. By experiment, we mean a collection of measurements from a set of `covariates' and the associated `outcomes'. While similar experiments can be retrieved by comparing available `annotations', this approach ignores the valuable information available in the measurements themselves. To incorporate this information in the retrieval task, we suggest employing a retrieval metric that utilizes probabilistic models learned… 

Figures from this paper


Retrieval of Experiments with Sequential Dirichlet Process Mixtures in Model Space
This work forms the retrieval as a ``supermodelling'' problem, of sequentially learning a model of the set of posterior distributions, represented as sets of MCMC samples, and suggests the use of Particle-Learning-based sequential Dirichlet process mixture (DPM) for this purpose.
Multi-Task Learning for Classification with Dirichlet Process Priors
Experimental results on two real life MTL problems indicate that the proposed algorithms automatically identify subgroups of related tasks whose training data appear to be drawn from similar distributions are more accurate than simpler approaches such as single-task learning, pooling of data across all tasks, and simplified approximations to DP.
A comparison of statistical significance tests for information retrieval evaluation
It is discovered that there is little practical difference between the randomization, bootstrap, and t tests and their use should be discontinued for measuring the significance of a difference between means.
Learning from Distributions via Support Measure Machines
A kernel-based discriminative learning framework on probability measures that learns using a collection of probability distributions that have been constructed to meaningfully represent training data and proposes a flexible SVM (Flex-SVM) that places different kernel functions on each training example.
Multi-Task Feature Learning
The method builds upon the well-known 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks, and develops an iterative algorithm for solving it.
Learning to rank using gradient descent
RankNet is introduced, an implementation of these ideas using a neural network to model the underlying ranking function, and test results on toy data and on data from a commercial internet search engine are presented.
Manual curation is not sufficient for annotation of genomic databases
Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases, suggesting that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes.
Effects of relevant contextual features in the performance of a restaurant recommender system
Results show that feature selection techniques can be applied successfully to identify relevant contextual data and are important to model contextual user profiles with meaningful information, to reduce dimensionality, and to analyze user’s decision criteria.
A Scalable Topic-Based Open Source Search Engine
This paper outlines a scalable system for site-based or topic-specific search, and demonstrates the developing system on a small 250,000 document collection of EU and UN web pages.
Modeling sample variables with an Experimental Factor Ontology
The application of reference ontologies to data is a key problem, and this work presents guidelines on how community ontologies can be presented in an application ontology in a data-driven way.