Dmitriy V. Ryaboy

Learn More
We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of(More)
This paper describes the use of Storm at Twitter. Storm is a real-time fault-tolerant and distributed stream data processing system. Storm is currently being used to run various critical computations in Twitter at scale, and in real-time. This paper describes the architecture of Storm and its methods for distributed scale-out and fault-tolerance. This paper(More)
The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We investigated different strategies of alignment for the subsequent analysis of conservation of genomes that are effective for assemblies of different quality. These strategies were applied to the comparison of the(More)
The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this paper, we discuss the evolution of our infrastructure and the development of capabilities for data mining on "big data". One important lesson is that successful big data mining in(More)
MapReduce, especially the Hadoop open-source implementation, has recently emerged as a popular framework for large-scale data analytics. Given the explosion of unstructured data begotten by social media and other web-based applications, we take the position that any modern analytics platform must support operations on free-text fields as first-class(More)
Comparative analysis of DNA sequences is becoming one of the major methods for discovery of functionally important genomic intervals. Presented here the VISTA family of computational tools was built to help researchers in this undertaking. These tools allow the researcher to align DNA sequences, quickly visualize conservation levels between them, identify(More)
The EPB41 (protein 4.1) genes epitomize the resourcefulness of the mammalian genome to encode a complex proteome from a small number of genes. By utilizing alternative transcriptional promoters and tissue-specific alternative pre-mRNA splicing, EPB41, EPB41L2, EPB41L3, and EPB41L1 encode a diverse array of structural adapter proteins. Comparative genomic(More)
In recent years, there has been a substantial amount of work on large-scale data analytics using Hadoop-based platforms running on large clusters of commodity machines. A lessexplored topic is how those data, dominated by application logs, are collected and structured to begin with. In this paper, we present Twitter’s production logging infrastructure and(More)
  • 1