Energy infrastructure is a critical underpinning of modern society that any compromise or sabotage of its secure and reliable operation has an enormous impact on people’s daily lives and the national economy. The massive northeastern power blackout of August 2003 and the most recent Florida blackout have both revealed serious defects in both system-level management and device-level designs of the power grid in handling attacks. At the system level, the control area operators lack the capability to 1) obtain real-time status information of the vastly distributed equipment; 2) respond rapidly enough once events start to unravel; and 3) perform coordinated actions autonomously across the region. At the device level, the traditional hardware lacks the capability to 1) provide reliable frequency and voltage control according to system demands and 2) rapidly reconfigure the system to a secure state through switches and power-electronics based devices. These blackouts were a wake-up call for both the industry and academia to consider new techniques and system architecture design that can help assure the security and reliability of the power grid. In this paper, we present a hardware-in-the-loop reconfigurable system design with embedded intelligence and resilient coordination schemes at both local and system levels that would tackle the vulnerabilities of the grid. The new system design consists of five key components: 1) a location-centric hybrid system architecture that facilitates not only distributed processing but also coordination among geographically close devices; 2) the insertion of intelligence into power electronic devices at the lower level of the power grid to enable a more direct reconfiguration of the physical makeup of the grid; 3) the development of a robust collaboration algorithm among neighboring devices to handle possible faulty, missing, or incomplete information; 4) the design of distributed algorithms to better understand the local state of the power grid; and 5) the adoption of a control-theoretic real-time adaptation strategy to guarantee the availability of large distributed systems. Preliminary evaluation results showing the advantages of each component are provided. A phased implementation plan is also suggested at the end of the discussion.