In this paper, a novel method to learn the intrinsic object structure for robust visual tracking is proposed. The basic assumption is that the parameterized object state lies on a low dimensional manifold and can be learned from training data. Based on this assumption, firstly we derived the dimensionality reduction and density estimation algorithm for unsupervised learning of object intrinsic representation, the obtained non-rigid part of object state reduces even to 2 dimensions. Secondly the dynamical model is derived and trained based on this intrinsic representation. Thirdly the learned intrinsic object structure is integrated into a particle-filter style tracker. We will show that this intrinsic object representation has some interesting properties and based on which the newly derived dynamical model makes particle-filter style tracker more robust and reliable. Experiments show that the learned tracker performs much better than existing trackers on the tracking of complex non-rigid motions such as fish twisting with self-occlusion and large inter-frame lip motion. The proposed method also has the potential to solve other type of tracking problems.