Software-based fast failure recovery in load balanced SDN-based datacenter networks
In this paper, we tackle the failure detection problem in OpenFlow networks. We have examined several commercial OpenFlow switches and found that the long failure detection time makes an OpenFlow network unable to realize fast failure recovery. To resolve this problem, we propose a software based failure detection and fault location identification scheme. Our design can be applied to both in-band controlled and out-of-band controlled OpenFlow networks. In the network, the OpenFlow controller periodically uses monitoring packets to probe network status. Through provisioning monitoring cycles to cover all links in a given network, the OpenFlow controller can detect and pinpoint a failed link within a short time. An algorithm based on the solution of min-max k-Chinese postman problem is proposed to determine the routing of the monitoring cycles. For an in-band controlled network, besides the monitoring cycles, a survivable in-band control tree (ICT) is provisioned to protect in-band control channels. We have conducted experiments to evaluate the performance of the proposed scheme. The results indicate that our approach generates very limited overhead on the OpenFlow controller. It enables an OpenFlow network to achieve fast failure recovery even when the network is in a heavy load state.