On the ERM Principle with Networked Data
@article{Wang2017OnTE, title={On the ERM Principle with Networked Data}, author={Yuanhong Wang and Yuyi Wang and Xingwu Liu and Juhua Pu}, journal={ArXiv}, year={2017}, volume={abs/1711.04297} }
Networked data, in which every training example involves two objects and may share some common objects with others, is used in many machine learning tasks such as learning to rank and link prediction. A challenge of learning from networked examples is that target values are not known for some pairs of objects. In this case, neither the classical i.i.d. assumption nor techniques based on complete U-statistics can be used. Most existing theoretical results of this problem only deal with the…
One Citation
Generalization Bounds for Knowledge Graph Embedding (Trained by Maximum Likelihood)
- Computer Science
- 2019
The results provide an explanation for why knowledge graph embedding methods work, as much as classical learning theory results provide explanations for classical learning from i.i.d. data.
References
SHOWING 1-10 OF 51 REFERENCES
Learning from Networked Examples
- Computer ScienceALT
- 2017
This work shows that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then considers alternatives, which lead to novel concentration inequalities.
Ranking and empirical minimization of U-statistics
- Computer Science
- 2006
This paper forms the ranking problem in a rigorous statistical framework, establishes in particular a tail inequality for degenerate U-processes, and applies it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification.
Risk bounds for statistical learning
- Computer Science
- 2007
A general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM) when the classification rules belong to some VC-class under margin conditions is proposed and discussed the optimality of these bounds in a minimax sense.
On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability
- Computer ScienceNIPS
- 2016
This paper focuses on the graph reconstruction problem, where the prediction rule is obtained by minimizing the average error over all n(n-1)/2 possible pairs of the n nodes of a training graph, and derives learning rates of order O(log n / n) for this problem, significantly improving upon the slow rates ofOrder O(1/√n) established in the seminal work of Biau & Bleakley (2006).
On Ranking and Generalization Bounds
- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2012
This paper considers ranking estimators that minimize the empirical convex risk and proves generalization bounds for the excess risk of such estimators with rates that are faster than 1/√n.
Classification in Networked Data: a Toolkit and a Univariate Case Study
- Computer ScienceJ. Mach. Learn. Res.
- 2007
The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data.
Optimal aggregation of classifiers in statistical learning
- Computer Science, Mathematics
- 2003
The main result of the paper concerns optimal aggregation of classifiers: a classifier that automatically adapts both to the complexity and to the margin, and attains the optimal fast rates, up to a logarithmic factor.
Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases
- Computer ScienceJ. Artif. Intell. Res.
- 2016
This paper proposes TATEC, a happy medium obtained by complementing a high-capacity model with a simpler one, both pre-trained separately and then combined, and shows that this approach outperforms existing methods on different types of relationships by achieving state-of-the-art results on four benchmarks of the literature.
Statistical inference on graphs
- Computer Science, Mathematics
- 2006
The problem of graph inference, or graph reconstruction, is to predict the presence or absence of edges between a set of given points known to form the vertices of a graph is shown to be random, with a probability distribution that possibly depends on the size of the graph.
Learning to rank for information retrieval
- Computer ScienceSIGIR
- 2009
Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.