Skip to search formSkip to main contentSkip to account menu

Apache Hadoop

Known as: HDFS, Hadoop YARN, Hadoop Distributed Filesystem 
Apache Hadoop (pronunciation: /həˈduːp/) is an open-source software framework for distributed storage and distributed processing of very large data… 
Wikipedia (opens in a new tab)

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
2013
2013
The paradigm of processing huge datasets has been shifted from centralized architecture to distributed architecture. As the… 
Highly Cited
2012
Highly Cited
2012
Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store… 
Highly Cited
2012
Highly Cited
2012
Software testing represents one of the most explored fields of application of Search-Based techniques and a range of testing… 
Highly Cited
2011
Highly Cited
2011
The MapReduce parallel computational model is of increasing importance. A number of High Level Query Languages (HLQLs) have been… 
Highly Cited
2011
Highly Cited
2011
A typical MapReduce cluster is shared among different users and multiple applications. A challenging problem in such shared… 
Highly Cited
2010
Highly Cited
2010
The York Extensible Testing Infrastructure (YETI) is an automated random testing tool that allows to test programs written in… 
Highly Cited
2009
Highly Cited
2009
This paper describes CloudWF, a scalable and lightweight computational workflow system for clouds on top of Hadoop. CloudWF can… 
Highly Cited
2001
Highly Cited
2001
We present near-IR (J and Ks) number counts and colors of galaxies detected in deep VLT-ISAAC images centered on the Chandra Deep…