On the ERM Principle with Networked Data

  title={On the ERM Principle with Networked Data},
  author={Yuanhong Wang and Yuyi Wang and Xingwu Liu and Juhua Pu},
Networked data, in which every training example involves two objects and may share some common objects with others, is used in many machine learning tasks such as learning to rank and link prediction. A challenge of learning from networked examples is that target values are not known for some pairs of objects. In this case, neither the classical i.i.d. assumption nor techniques based on complete U-statistics can be used. Most existing theoretical results of this problem only deal with the… 

Figures and Tables from this paper

Generalization Bounds for Knowledge Graph Embedding (Trained by Maximum Likelihood)

The results provide an explanation for why knowledge graph embedding methods work, as much as classical learning theory results provide explanations for classical learning from i.i.d. data.



Learning from Networked Examples

This work shows that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then considers alternatives, which lead to novel concentration inequalities.

Ranking and empirical minimization of U-statistics

This paper forms the ranking problem in a rigorous statistical framework, establishes in particular a tail inequality for degenerate U-processes, and applies it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification.

Risk bounds for statistical learning

A general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM) when the classification rules belong to some VC-class under margin conditions is proposed and discussed the optimality of these bounds in a minimax sense.

On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability

This paper focuses on the graph reconstruction problem, where the prediction rule is obtained by minimizing the average error over all n(n-1)/2 possible pairs of the n nodes of a training graph, and derives learning rates of order O(log n / n) for this problem, significantly improving upon the slow rates ofOrder O(1/√n) established in the seminal work of Biau & Bleakley (2006).

On Ranking and Generalization Bounds

  • W. Rejchel
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2012
This paper considers ranking estimators that minimize the empirical convex risk and proves generalization bounds for the excess risk of such estimators with rates that are faster than 1/√n.

Classification in Networked Data: a Toolkit and a Univariate Case Study

The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data.

Optimal aggregation of classifiers in statistical learning

The main result of the paper concerns optimal aggregation of classifiers: a classifier that automatically adapts both to the complexity and to the margin, and attains the optimal fast rates, up to a logarithmic factor.

Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases

This paper proposes TATEC, a happy medium obtained by complementing a high-capacity model with a simpler one, both pre-trained separately and then combined, and shows that this approach outperforms existing methods on different types of relationships by achieving state-of-the-art results on four benchmarks of the literature.

Statistical inference on graphs

The problem of graph inference, or graph reconstruction, is to predict the presence or absence of edges between a set of given points known to form the vertices of a graph is shown to be random, with a probability distribution that possibly depends on the size of the graph.

Learning to rank for information retrieval

Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.