Timing analysis in embedded systems has focused mainly on the Worst-Case Execution Time (WCET) in the past. This was (and still is) important to make guarantees for the application of the system in safety critical environments. Today, two reasons call for a slightly changed perspective. Firstly, the complex and often unpredictable internal structure of modern system-on-chip architectures prohibits the calculation of realistic upper bounds for the WCET. Secondly, even if we can compute a realistic value for the WCET, the developer still does not know how the code under scrutiny behaves in general and whether it is useful or necessary to spend time on optimising this code. In this contribution, we present a new method and hardware architecture to collect Execution Time Profiles (ETP) which give us much more insight in the execution time behaviour on modern system-on-chip architectures as previously available.