I consider the issue of learning generative probabilistic models (e.g., Bayesian Networks) for the problems of classification and regression. As the generative models now serve as target-predicting functions, the learning problem can be treated differently from the traditional density estimation. Unlike the likelihood maximizing generative learning that fits a model to overall data, the discriminative learning is an alternative estimation method that optimizes the objectives that are much closely related with the prediction task (e.g., the conditional likelihood of target variables given input attributes). The contribution of this work is three-fold. First, for the family of general generative models, I provide a unifying parametric gradient-based optimization method for the discriminative learning. In the second part, not restricted to the classification problem with discrete targets, the method is applied to the continuous multivariate state domain, resulting in dynamical systems learned discriminatively. This is very appealing approach toward the structured state prediction problems such as motion tracking, in that the discriminative models in discrete domains (e.g., Conditional Random Fields or Maximum Entropy Markov Models) can be problematic to be extended to handle continuous targets properly. For the CMU motion capture data, I evaluate the generalization performance of the proposed methods on the 3D human pose tracking problem from the monocular videos. Despite the improved prediction performance of the discriminative learning, the parametric gradient-based optimization may have certain drawbacks such as the computational overhead and the sensitivity to the choice of the initial model. In the third part, I address these issues by introducing a novel recursive method for discriminative learning. The proposed method estimates a mixture of generative models, where the component to be added at each stage is selected in a greedy fashion, by the criterion maximizing the conditional likelihood of the new mixture. The approach is highly efficient as it reduces to the generative learning of the base generative models on weighted data. Moreover it is less sensitive to the initial model choice by enhancing the mixture model recursively. The improved classification performance of the proposed method is demonstrated in an extensive set of evaluations on time-series sequence data, including human motion classification problems.