Excess Risk Bounds for Multi-Task Learning


The idea that it should be easier to learn several tasks if they are related in some way is quite intuitive and has been found to work in many practical settings. There has been some interest in obtaining theoretical results to better understand this phenomenon (e.g. [3, 4]). Maurer [4] considers the case when the “relatedness” of the tasks is captured by requiring that all tasks share a common “preprocessor”. Different linear classifiers are learned for the tasks where these classifiers all operate on the “preprocessed” input. Maurer obtains dimension-free and data-dependent bounds in this setting. He bounds the average error over tasks in terms of the margins of the classifiers and a complexity term involving the Hilbert-Schmidt norm of the selected preprocessor and the Frobenius norm of the Gram matrix for all tasks. We work in the same setting as Maurer’s. However, we introduce a loss function to measure the performance of the selected classifiers. Our aim is to obtain bounds for the difference between the average risk per task of the classifiers learned from the data and the least possible value of the average risk per task. Suppose we have m binary classification tasks with a common input space X which is a unit ball {x : ‖x‖ ≤ 1} in some Hilbert space H . Since we deal with binary classification, the output space is Y = {+1,−1}. Let v denote a tuple of classifiers (v1, . . . , vm) with vl ∈ H for all l ∈ {1, . . . ,m}. Let A be a set of symmetric Hilbert-Schmidt operators with ‖T ‖HS ≤ t for all T ∈ A. Denote the input distribution for task l by P l and let

Cite this paper

@inproceedings{Tewari2006ExcessRB, title={Excess Risk Bounds for Multi-Task Learning}, author={Ambuj Tewari}, year={2006} }