Learn More
Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but(More)
The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for efficient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction(More)
To explore the genetic contribution to autistic spectrum disorders (ASDs), we have studied genomic copy-number variation in a large cohort of families with a single affected child and at least one unaffected sibling. We confirm a major contribution from de novo deletions and duplications but also find evidence of a role for inherited "ultrarare"(More)
Information on molecular networks, such as networks of interacting proteins, comes from diverse sources that contain remarkable differences in distribution and quantity of errors. Here, we introduce a probabilistic model useful for predicting protein interactions from heterogeneous data sources. The model describes stochastic generation of protein-protein(More)
BACKGROUND Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as(More)
Knowledge on interactions between molecules in living cells is indispensable for theoretical analysis and practical applications in modern genomics and molecular biology. Building such networks relies on the assumption that the correct molecular interactions are known or can be identified by reading a few research articles. However, this assumption does not(More)
Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually(More)
INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free(More)