Providing high-speed data transfer is vital to various data-intensive applications. While there have been remarkable technology advances to provide ultra-high-speed network bandwidth, existing protocols and applications may not be able to fully utilize the bare-metal bandwidth due to their inefficient design. We identify the same problem remains in the field of Remote Direct Memory Access (RDMA) networks. RDMA offloads TCP/IP protocols to hardware devices. However, its benefits have not been fully exploited due to the lack of efficient software and application protocols, in particular in wide-area networks. In this paper, we address the design choices to develop such protocols. We describe a protocol implemented as part of a communication middleware. The protocol has its flow control, connection management, and task synchronization. It maximizes the parallelism of RDMA operations. We demonstrate its performance benefit on various local and wide-area testbeds, including the DOE ANI testbed with RoCE links and InfiniBand links.