Adam Hughes

Learn More
—Many scientific applications suffer from the lack of a unified approach to support the management and efficient processing of large-scale data. The Twister MapReduce Framework, which not only supports the traditional MapReduce programming model but also extends it by allowing iterations, addresses these problems. This paper describes how Twister is applied(More)
BACKGROUND Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that(More)
Modern biology is experiencing a rapid increase in data volumes that challenges our analytical skills and existing cyberinfrastructure. Exponential expansion of the Protein Sequence Universe (PSU), the protein sequence space, together with the costs and complexities of manual curation creates a major bottleneck in life sciences research. Existing resources(More)
Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as 16S rRNA, directly from environmental or clinical samples without the need for laboratory purification. Alignment of sequences across the resultant large data sets (100,000+ sequences) is of particular interest for the purpose of identifying potential gene(More)
— Biological sequence data can be subjected to a variety of analysis workflows to glean pertinent scientific insight. Recent advances in sequencing techniques have led to a deluge of biosequence data, which necessitates the use of high-performance computing resources in order to carry out analysis in a reasonable period of time. The tasks involved in(More)
— The advent and continued refinement of modern high-throughput sequencing techniques have led to a proliferation of raw biosequence data, as labs routinely generate millions of sequence reads in a matter of days. Analyzing these results is beyond the computational capacity of single-lab resources, necessitating the use of high-performance computing(More)
  • 1