Nowadays, computer vision algorithms have countless application domains. On the one hand, these algorithms are typically computationally demanding, on the other hand, they are often used in embedded systems, which have stringent constraints on, e. g., size or power. In this work, we present the benefits of mapping compute-intensive imaging algorithms on programmable massively parallel processor arrays. More specific, we propose different variants of a combined corner and edge detection algorithm, the Harris Corner Detector (HCD), map these variants onto tightly-coupled processor arrays (TCPAs), and prototype the TCPA architecture, executing the different HCD implementations, in FPGA technology. Because floating-point operations are very costly in FPGAs, we use fixed-point arithmetic in our design, and evaluate our implementation by means of accuracy and performance against two state-of-the-art implementations: (a) the OpenCV library of programming functions for real-time computer vision, using 64-bit floating-point precision, and (b) a 32-bit fixed-point DSP-based embedded system. The accuracy of our work is evaluated by considering the number of corners detected. Here, our approach achieves an average error of less than 1.5% when compared with a reference implementation. Our different variants, trading accuracy for performance, are mapped to the programmable processor elements of a TCPA. Here, the fastest TCPA implementation achieves a 55 times higher frame rate than a state-of-the-art implementation of the HCD on a digital signal processor. Finally, we show how our implementation can be used in the context of a new resource-aware parallel computing paradigm, called invasive computing. Here, an application can adapt itself at run-time in order to satisfy different quality and throughput requirements.