This paper addresses 2 problems. First, we emphasize the need for generic programming methods for the real time (RT) implementation of complex low level image processing algorithms on parallel DSPs featuring multi-processing and ILP (Instruction Level Parallelism) and multidimensional DMA. Second, we show the need for a RT implementation of a motion detection algorithm on hardware platforms suitable for low cost embedded systems. To tackle these issues, we show how a DMA based SDF (Synchronous Data Flow) methodology that is dynamic and generic in terms of processing configurations (according to the processing chains, image sizes and number of processors involved) can be used to implement a MRF (Markov Random Field) based motion detection algorithm on an advanced parallel DSP architecture: the TMS320C80. This case study shows the adequacy of our approach and demonstrate a speed factor of 4 compared to previously published implementations for the targeted algorithm. Furthermore, we estimate that RT performance can be achieved for 256 images on an optimal C80-based system.