Allison P. Heath

Learn More
The virulence of Mycobacterium tuberculosis depends on the ability of the bacilli to switch between replicative (growth) and non-replicative (dormancy) states in response to host immunity. However, the gene regulatory events associated with transition to dormancy are largely unknown. To address this question, we have assembled the largest M. tuberculosis(More)
an information system called the NCI Genomic Data Commons (GDC) (see figure). The GDC will initially contain raw genomic data as well as diagnostic, histologic, and clinical outcome data from NCI-funded projects such as the Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program. Unlike(More)
Any given Web search engine may provide higher quality results than others for certain queries. Therefore, it is in users' best interest to utilize multiple search engines. In this paper, we propose and evaluate a framework that maximizes users' search effective-ness by directing them to the engine that yields the best results for the current query. In(More)
In this paper we describe the design, and implementation of the Open Science Data Cloud, or OSDC. The goal of the OSDC is to provide petabyte-scale data cloud infrastructure and related services for scientists working with large quantities of data. Currently, the OSDC consists of more than 2000 cores and 2 PB of storage distributed across four data centers(More)
Multiscale methods are becoming increasingly promising as a way to characterize the dynamics of large protein systems on biologically relevant time-scales. The underlying assumption in multiscale simulations is that it is possible to move reliably between different resolutions. We present a method that efficiently generates realistic all-atom protein(More)
BACKGROUND As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. (More)
Hadoop has emerged as an important platform for data intensive computing. The shuffle and sort phases of a MapReduce computation often saturate top of the rack switches, as well as switches that aggregate multiple racks. In addition, MapReduce computations often have "hot spots" in which the computation is lengthened due to inadequate bandwidth to some of(More)
MOTIVATION Finding novel or non-standard metabolic pathways, possibly spanning multiple species, has important applications in fields such as metabolic engineering, metabolic network analysis and metabolic network reconstruction. Traditionally, this has been a manual process, but the large volume of metabolic data now available has created a need for(More)
Systems biology is a broad field that incorporates both computational and experimental approaches to provide a system level understanding of biological function. Initial forays into computational systems biology have focused on a variety of biological networks such as protein-protein interaction, signaling, transcription and metabolic networks. In this(More)