In this paper, we propose a two-stage thermal-aware task scheduling policy which exploits the application and system architecture characteristics to decouple the mapping of task-graphs for the performance and peak temperature optimization into two stages. At the first stage, the algorithm collects the best mapping of task-graphs exploiting the application and architecture characteristics to minimize the makespan of the task-graphs. At the second stage, a light-weight online algorithm comprised of efficient thermal rank and combined power models is performed to map the task nodes to the real cores for temperature minimization while maintaining the best possible performance achieved in the first stage. Compared to the previous approaches which perform the performance and temperature optimization together, our method can reduce the online mapping algorithm complexity and improve its efficiency. Experiments on real benchmarks show that an average of 6.3°C peak temperature reduction and 6.8% performance improvement can be achieved compared to other existing methods.