A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the use of spectral templates that correspond to sound states such as attack, sustain, and decay. The order of these templates is controlled using hidden Markov model-based temporal constraints. In addition, the model can exploit multiple templates per pitch and instrument source. The shift-invariant aspect of the model makes it suitable for music signals that exhibit frequency modulations or tuning changes. Pitch-wise hidden Markov models are also utilized in a postprocessing step for note tracking. For training, sound state templates were extracted for various orchestral instruments using isolated note samples. The proposed transcription system was tested on multiple-instrument recordings from various datasets. Experimental results show that the proposed model is superior to a non-temporally constrained model and also outperforms various state-of-the-art transcription systems for the same experiment.