Learn More
Many areas of science are seeing a data deluge coming from new instruments, myriads of sensors and exponential growth in electronic records. We take two examples – one the analysis of gene sequence data (35339 Alu sequences) and other a study of medical information (over 100,000 patient records) in Indianapolis and their relationship to Geographic and(More)
BACKGROUND Discovering the functions of all genes is a central goal of contemporary biomedical research. Despite considerable effort, we are still far from achieving this goal in any metazoan organism. Collectively, the growing body of high-throughput functional genomics data provides evidence of gene function, but remains difficult to interpret. RESULTS(More)
We present our experiences in applying, developing, and evaluating cloud and cloud technologies. First, we present our experience in applying Hadoop and DryadLINQ to a series of data/compute intensive applications and then compare them with a novel MapReduce runtime developed by us, named CGL-MapReduce, and MPI. Preliminary applications are developed for(More)
BACKGROUND Human immunodeficiency virus (HIV) research involves ongoing, repetitious sequencing of the HIV genome and the massive accumulation of associated investigational data. As a result, the storage of annotated DNA and/or protein sequences, as well as information retrieval, have become increasingly difficult tasks, with scientists extracting less(More)
Many areas of science are seeing a data deluge coming from new instruments, myriads of sensors and exponential growth in electronic records. We take two examples – one the analysis of gene sequence data (35339 Alu sequences) and other a study of medical information (over 100,000 patient records) in Indianapolis and their relationship to Geographic and(More)
We take two large scale data intensive problems from biology. One is a study of EST (Expressed Sequence Tag) Assembly with half a million mRNA sequences. The other one is the analysis of gene sequence data (35339 Alu sequences). These test cases can scale to state of the art problems such as clustering of a million sequences. We look at initial processing(More)
  • 1