An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data

@article{Carty2017AnIM,
  title={An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data},
  author={Mark A Carty and Lee Zamparo and Merve Sahin and Alvaro J. Gonz{\'a}lez and Raphael A. Pelossof and Olivier Elemento and Christina S. Leslie},
  journal={Nature Communications},
  year={2017},
  volume={8}
}
Here we present HiC-DC, a principled method to estimate the statistical significance (P values) of chromatin interactions from Hi-C experiments. HiC-DC uses hurdle negative binomial regression account for systematic sources of variation in Hi-C read counts—for example, distance-dependent random polymer ligation and GC content and mappability bias—and model zero inflation and overdispersion. Applied to high-resolution Hi-C data in a lymphoblastoid cell line, HiC-DC detects significant… 
MaxHiC: robust estimation of chromatin interaction frequency in Hi-C and capture Hi-C experiments
TLDR
MaxHiC is a robust machine learning based tool for identifying significant interacting regions from both hi-C and capture Hi-C data and significantly outperforms current existing tools in terms of enrichment of interactions between known regulatory regions as well as biologically relevant interactions.
An Integrative Approach for Fine-Mapping Chromatin Interactions
TLDR
π-SCNN provides an approach for analyzing important aspects of genome architecture and regulation at a higher resolution than previously possible and recovers original Hi-C peaks after extending them to be coarser, supporting their biological significance.
Detecting local changes in chromatin architecture with false discovery control
TLDR
The improved power and precision of the sliding window statistics approach and its ability to reveal biologically meaningful changes in chromatin architecture are illustrated through two data analyses concerning the loss of architectural and chromatin remodeling proteins.
Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2
TLDR
The FitHiC2 protocol is described, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites.
Identification of significant chromatin contacts from HiChIP data by FitHiChIP
TLDR
FitHiChIP is a computational method for loop calling from HiChIP/PLAC-seq data, which jointly models the non-uniform coverage and genomic distance scaling of contact counts to compute statistical significance estimates and develops a technique to filter putative bystander loops that can be explained by stronger adjacent loops.
Estimating DNA-DNA interaction frequency from Hi-C data at restriction-fragment resolution
TLDR
The hi-C Interaction Frequency Inference algorithms are presented, a family of computational approaches that takes advantage of dependencies between neighboring restriction fragments to estimate restriction-fragment resolution interaction frequency matrices from Hi-C data and reveals a new role for active regulatory regions in structuring topologically associating domains (TADs) and subTads.
Deciphering hierarchical organization of topologically associated domains through change-point testing
TLDR
The method and theoretical result of the GLR test provide a general framework for significance testing of similar experimental chromatin interaction data that may not fully follow negative binomial distributions but rather more general mixture distributions.
VSS-Hi-C: Variance-stabilized signals for chromatin 3D contacts
TLDR
This work proposes an approach called VSS-Hi-C that normalizes Hi-C data to produce variance-stabilized signals and outperforms the other transformation approaches in stabilizing the variance of the data and improves downstream analysis such as identifying chromosomal subcompartments.
Seeing the forest through the trees: prioritising potentially functional interactions from Hi-C
TLDR
This review collate and examine the downstream analysis of Hi-C data with particular focus on methods that prioritise potentially functional interactions, including structural-based discovery methods, e.g. A/B compartments and topologically associated domains, detection of statistically significant chromatin interactions, and the use of epigenomic data integration to narrow down useful interaction information.
Seeing the forest through the trees: Identifying functional interactions from Hi-C
TLDR
This review collate and examine the downstream analysis of Hi-C data with particular focus on methods that identify significant functional interactions, and classify three groups of approaches; structurally-associated domain discovery methods e.g. topologically-associated domains and compartments, detection of statistically significant interactions via background models, and the use of epigenomic data integration to identify functional interactions.
...
...

References

SHOWING 1-10 OF 23 REFERENCES
Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts.
TLDR
Fit-Hi-C is described, a method that assigns statistical confidence estimates to mid-range intra-chromosomal contacts by jointly modeling the random polymer looping effect and previously observed technical biases in Hi-C data sets and shows that insulators and heterochromatin regions are hubs for high-confidence contacts, while promoters and strong enhancers are involved in fewer contacts.
Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture
TLDR
Analysis of corrected human lymphoblast contact maps provides genome-wide evidence for interchromosomal aggregation of active chromatin marks, including DNase-hypersensitive sites and transcriptionally active foci.
A hidden Markov random field-based Bayesian method for the detection of long-range chromosomal interactions in Hi-C data
TLDR
A hidden Markov random field (HMRF) based Bayesian method to rigorously model interaction probabilities in the two-dimensional space based on the contact frequency matrix is proposed and demonstrates superior reproducibility and statistical power in both simulation studies and real data analysis.
CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data
TLDR
A background model and algorithms for normalisation and multiple testing that are specifically adapted to CHi-C experiments are presented and validate CHiCAGO by showing that promoter-interacting regions detected with this method are enriched for regulatory features and disease-associated SNPs.
FourCSeq : analysis of 4 C sequencing data
TLDR
The overall trend of decreasing interaction frequency with genomic distance is modeled by fitting a smooth monotonically decreasing function to suitably transformed count data and high z-scores are interpreted as peaks providing evidence for specific interactions.
Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions
TLDR
It is found that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.
Mapping of long-range associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulation
TLDR
The study suggests the presence of a global genome organization in fission yeast that is functionally similar to the recently proposed mammalian transcription factory.
FourCSeq: analysis of 4C sequencing data
TLDR
The overall trend of decreasing interaction frequency with genomic distance is model by fitting a smooth monotonously decreasing function to suitably transformed count data and z-scores are calculated from the residuals, with high z scores being interpreted as peaks providing evidence for specific interactions.
...
...