Learn More
Multiscale methods are becoming increasingly promising as a way to characterize the dynamics of large protein systems on biologically relevant time-scales. The underlying assumption in multiscale simulations is that it is possible to move reliably between different resolutions. We present a method that efficiently generates realistic all-atom protein(More)
MOTIVATION Finding novel or non-standard metabolic pathways, possibly spanning multiple species, has important applications in fields such as metabolic engineering, metabolic network analysis and metabolic network reconstruction. Traditionally, this has been a manual process, but the large volume of metabolic data now available has created a need for(More)
The virulence of Mycobacterium tuberculosis depends on the ability of the bacilli to switch between replicative (growth) and non-replicative (dormancy) states in response to host immunity. However, the gene regulatory events associated with transition to dormancy are largely unknown. To address this question, we have assembled the largest M. tuberculosis(More)
Any given Web search engine may provide higher quality results than others for certain queries. Therefore, it is in users' best interest to utilize multiple search engines. In this paper, we propose and evaluate a framework that maximizes users' search effective-ness by directing them to the engine that yields the best results for the current query. In(More)
—In this paper we describe the design, and implementation of the Open Science Data Cloud, or OSDC. The goal of the OSDC is to provide petabyte-scale data cloud infrastructure and related services for scientists working with large quantities of data. Currently, the OSDC consists of more than 2000 cores and 2 PB of storage distributed across four data centers(More)
BACKGROUND As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. (More)
Systems biology is a broad field that incorporates both computational and experimental approaches to provide a system level understanding of biological function. Initial forays into computational systems biology have focused on a variety of biological networks such as protein–protein interaction, signaling, transcription and metabolic networks. In this(More)
— Hadoop has emerged as an important platform for data intensive computing. The shuffle and sort phases of a MapReduce computation often saturate top of the rack switches, as well as switches that aggregate multiple racks. In addition, MapReduce computations often have " hot spots " in which the computation is lengthened due to inadequate bandwidth to some(More)
This paper presents a graph-based algorithm for identifying complex metabolic pathways in multi-genome scale metabolic data. These complex pathways are called branched pathways because they can arrive at a target compound through combinations of pathways that split compounds into smaller ones, work in parallel with many compounds, and join compounds into(More)