Studying the relative behavior of an application’s threads is critical to identifying performance bottlenecks and understanding their root causes. We present context-sensitive parallel (CSP) execution profiles, that capture the relative behavior of threads in terms of the user selected code regions they execute. CSPs can be analyzed to compute execution times spent by the application in interesting behavior states. To capture execution context, code regions of interest can be given static and dynamic names using a versatile set of annotations. The CSP divides the execution time of a multithreaded application into a sequence of time intervals called frames, during which no thread transitions between code regions. By appropriate selection and naming of code regions, the user can obtain a CSP that captures all occurrences of arbitrary behavior states. We provide the user with a powerful query language to facilitate the analysis of CSPs. Our implementation for collection of CSPs of C++ programs has low overhead and high accuracy. Collection of CSPs of full executions of 12 Parsec programs incurred overhead of at most 7% in execution time. The accuracy of CSPs was validated in the context of common performance problems such as load imbalance in pipeline stages and the presence of straggler threads.