Learn More
Single-cell RNA-seq can yield valuable insights about the variability within a population of seemingly homogeneous cells. We developed a quantitative statistical method to distinguish true biological variability from the high levels of technical noise in single-cell experiments. Our approach quantifies the statistical significance of observed cell-to-cell(More)
Genetically identical populations of cells grown in the same environmental condition show substantial variability in gene expression profiles. Although single-cell RNA-seq provides an opportunity to explore this phenomenon, statistical methods need to be developed to interpret the variability of gene expression counts. We develop a statistical framework for(More)
The N-terminal transit peptides of nuclear-encoded plastid proteins are necessary and sufficient for their import into plastids, but the information encoded by these transit peptides remains elusive, as they have a high sequence diversity and lack consensus sequences or common sequence motifs. Here, we investigated the sequence information contained in(More)
Predicting the destination of a protein in a cell is important for annotating the function of the protein. Recent advances have allowed us to develop more accurate methods for predicting the subcellular localization of proteins. One of the most important factors for improving the accuracy of these methods is related to the introduction of new useful(More)
Single-cell RNA-sequencing (scRNA-seq) facilitates identification of new cell types and gene regulatory networks as well as dissection of the kinetics of gene expression and patterns of allele-specific expression. However, to facilitate such analyses, separating biological variability from the high level of technical noise that affects scRNA-seq protocols(More)
Tree-dependent component analysis (TCA) is a generalization of independent component analysis (ICA), the goal of which is to model the multivariate data by a linear transformation of latent variables , while latent variables fit by a tree-structured graphical model. In contrast to ICA, TCA allows dependent structure of latent variables and also consider(More)
Accurate prediction of transcription factor binding sites (TFBSs) is a prerequisite for identifying cis-regulatory modules that underlie transcriptional regulatory circuits encoded in the genome. Here, we present a computational framework for detecting TFBSs, when multiple position weight matrices (PWMs) for a transcription factor are available. Grouping(More)
Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand,(More)
Single-cell RNA sequencing (scRNA-seq) has broad applications across biomedical research. One of the key challenges is to ensure that only single, live cells are included in downstream analysis, as the inclusion of compromised cells inevitably affects data interpretation. Here, we present a generic approach for processing scRNA-seq data and detecting low(More)
As a normative study of the Korean version of the California Verbal Learning Test (K-CVLT), the present study examined the K-CVLT performances of 357 neurologically intact Koreans (181 males and 176 females) who were selected by stratified sampling reflecting the recent Korean census data. The factor analysis of the K-CVLT showed that the K-CVLT was a valid(More)