The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures increased core density without matched improvement in memory capacity is reducing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource contention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes.