As access to supercomputing resources is becoming more and more commonplace, performance analysis tools are gaining importance in order to decrease the gap between the application performance and the supercomputers' peak performance. Performance analysis tools allow the analyst to understand the idiosyncrasies of an application in order to improve it. However, these tools require monitoring regions of the application to provide information to the analysts, leaving non-monitored regions of code unknown, which may result in lack of understanding of important regions of the application. In this paper we describe an automated methodology that reports very detailed application insights and improves the analysis experience of performance tools based on traces. We apply this methodology to three production applications and provide suggestions on how to improve their performance. Our methodology uses computation burst clustering and a mechanism called folding. While clustering automatically detects application structure, folding combines instrumentation and sampling to augment the performance analysis details. Folding provides fine grain performance information from coarse grain sampling on iterative applications. Folding results closely resemble the performance data gathered from fine grain sampling with an absolute mean difference less than 5% without overhead of fine grain.