Sudipta Sengupta

Learn More
To be agile and cost effective, data centers should allow dynamic resource allocation across large server pools. In particular, the data center network should enable any server to be assigned to any service. To meet these goals, we present VL2, a practical network architecture that scales to support huge data centers with uniform high capacity between(More)
Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments that lead to high application latencies, rooted in(More)
We explore the nature of traffic in data centers, designed to support the mining of massive data sets. We instrument the servers to collect socket-level logs, with negligible performance impact. In a 1500 server operational cluster, we thus amass roughly a petabyte of measurements over two months, from which we obtain and report detailed views of traffic(More)
A recent approach, COPE, for improving the throughput of unicast traffic in wireless multi-hop networks exploits the broadcast nature of the wireless medium through opportunistic network coding. In this paper, we analyze throughput improvements obtained by COPE-type network coding in wireless networks from a theoretical perspective. We make two key(More)
Storage deduplication has received recent interest in the research community. In scenarios where the backup process has to complete within short time windows, inline deduplication can help to achieve higher backup throughput. In such systems, the method of identifying duplicate data, using disk-based indexes on chunk hashes, can create throughput(More)
Cloud data centers host diverse applications, mixing in the same network a plethora of workflows that require small predictable latency with others requiring large sustained throughput. In this environment, today’s state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal network impairments, such as(More)
We present FlashStore, a high throughput persistent keyvalue store, that uses flash memory as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the working set of key-value pairs on flash and use one flash read per key lookup. As the working set changes over time, space is made for the current working set by destaging recently(More)
Traditionally, the channelization structure in IEEE 802.11-based wireless LANs has been fixed: Each access point (AP) is assigned one channel and all channels are equally wide. In contrast, it has recently been shown that even on commodity hardware, the channel-width can be adapted dynamically purely in software. Leveraging this capability, we study the use(More)
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B-tree,(More)
We present a large scale study of primary data deduplication and use the findings to drive the design of a new primary data deduplication system implemented in the Windows Server 2012 operating system. File data was analyzed from 15 globally distributed file servers hosting data for over 2000 users in a large multinational corporation. The findings are used(More)