ARM's big. LITTLE architecture coupled with Heterogeneous Multi-Processing (HMP) has enabled energy-efficient solutions in the dark silicon era. System-level techniques activate nonadjacent cores to eliminate chip thermal hotspot. However, it unexpectedly increases communication delay due to longer distance in network architectures, and in turn degrades application performance and system energy efficiency. In this paper, we present a novel hierarchical hardware-software collaborated approach to address the performance/temperature conflict in dark silicon many-core systems. Optimizations on interprocessor communication, application performance, chip temperature and energy consumption are well isolated and addressed in different phases. Evaluation results show that on average 22.57% reduction of communication latency, 23.04% improvement on energy efficiency and 6.11°C reduction of chip peak temperature are achieved compared with state-of-the-art techniques.