Learn More
—The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can(More)
Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day. This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort and(More)
Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal storage efficiency, they require significantly high network and disk usage during recovery of missing data. In this paper, we first present a study on the impact of(More)
Erasure codes such as Reed-Solomon (RS) codes are being extensively deployed in data centers since they offer significantly higher reliability than data replication methods at much lower storage overheads. These codes however mandate much higher resources with respect to network bandwidth and disk IO during reconstruction of data that is missing or(More)
Big Data has long been the topic of fascination for Computer Science enthusiasts around the world, and has gained even more prominence in the recent times with the continuous explosion of data resulting from the likes of social media and the quest for tech giants to gain access to deeper analysis of their data. This paper discusses two of the comparison(More)
We describe an environment for the distributed solution of iterative grid-based applications. The environment is built using the MESSENGERS mobile agent system. The main advantage of paradigm-oriented distributed computing is that the user only needs to specify application-specific sequential code, while the underlying infrastructure takes care of the(More)
We describe an environment for distributed computing that uses the concept of well-known paradigms. The main advantage of paradigm-oriented distributed computing is that the user only needs to specify application-specific sequential code, while the underlying infrastructure takes care of the parallelization and distribution. The main features of the(More)
We describe the implementation underlying an environment for distributed computing that uses the concept of well-known paradigms. The main advantage of paradigm-oriented distributed computing is that the user only needs to specify application-specific sequential code, while the underlying infrastructure takes care of the parallelization and distribution.(More)
We introduce a technique for lowering the communication cost in a certain type of distributed application, in which processors perform computation in each time step and must obtain boundary data from their neighbors before they can perform the next time step. A typical example of such an application is solving diier-ential equations using the nite diierence(More)