......................................................................................................................................... ii Preface ........................................................................................................................................... iv Table of Contents ........................................................................................................................ vii List of Tables ................................................................................................................................ xv List of Figures ............................................................................................................................. xvi List of Supplementary Materials ............................................................................................ xviii List of Abbreviations ................................................................................................................. xix Acknowledgements .................................................................................................................... xxi Dedication ................................................................................................................................. xxiii Chapter 1: Introduction ................................................................................................................1 1.1
Overview of regulation of gene expression ....................................................................... 2 1.2
Transcription factor binding .............................................................................................. 3 1.2.1
Computational transcription factor motif discovery and binding site prediction ....... 5 1.3
Chromatin accessibility ...................................................................................................... 7 1.4
Epigenetic regulation of gene expression .......................................................................... 8 1.5
X-Chromosome Inactivation – a special case of epigenetic regulation by RNA ............ 10 1.6
Transcriptional levels and genomic annotation of regulatory sequences ........................ 10 1.7
Chromatin interaction, looping and topologically associating domains .......................... 12 1.8
Public data and tool resources used ................................................................................. 14 1.9
Variation of gene regulation between individuals and the association with diseases ...... 15 1.10
Thesis overview and objectives ..................................................................................... 17 viii Chapter 2: Identification of regulatory variants using genomic and epigenomic data sets ..23 2.1
Synopsis ........................................................................................................................... 23 2.2
Background ...................................................................................................................... 24 2.3
Methods............................................................................................................................ 28 2.3.1
UCSC GWAS SNPs and the corresponding LD80 SNPs ......................................... 29 2.3.2
High-throughput sequencing data from ENCODE ................................................... 30 2.3.3
Annotation of regulatory sequences .......................................................................... 30 2.3.4
Genomic functional categories ................................................................................. 31 2.3.5
Enrichment tests of SNPs in regulatory sequences ................................................... 31 2.3.6
Differential TF binding affinity analysis using PWMs ............................................. 32 2.3.7
Regulatory potential index ........................................................................................ 33 2.3.8
Case study of a lung cancer meta-analysis ............................................................... 33 2.3.9
Topological domains and chromatin interactions from Hi-C datasets ...................... 34 2.4
Results .............................................................................................................................. 34 2.4.1
Cancer susceptibility SNPs frequently occur in non-coding regions ........................ 34 2.4.2
Delineating potential regulatory sequences of the genome in different cell types ... 35 2.4.3
Cancer susceptibility SNPS are enriched in regulatory sequences ........................... 36 2.4.4
Consequence of the SNPs on TF binding affinity scores ......................................... 39 2.4.5
Prioritizing functional SNPs using regulatory potential, TF binding affinity and binding evidence ................................................................................................................... 39 2.4.6
A case study: functional interpretation of a borderline lung cancer relevant SNP ... 45 2.4.7
Inferring potential targets of a SNP using topological domains ............................... 48 2.5
Discussion ........................................................................................................................ 50 ix Chapter 3: Differential genomic and epigenomic analyses between sexes on chrX ..............55 3.1
Synopsis ........................................................................................................................... 55 3.2
Background ...................................................................................................................... 56 3.3
Results .............................................................................................................................. 61 3.3.1
Sex classification of the FANTOM5 samples using CAGE data ............................. 61 3.3.2
TSSs on chrX with higher expression in female reflect escapees ............................ 63 3.3.3
DNAm similarity between sexes at chrX TSSs reflects bi-allelically transcribed escapees ................................................................................................................................. 64 3.3.4
YY1 binding motif over-representation around escTSSs ......................................... 67 3.3.5
Over-representation of TF ChIP-seq peaks around bi-escTSSs ............................... 69 3.3.6
ChIP-seq read depth reveals overall reduction of input from heterochromatic Xi ... 71 3.3.7
YY1 binding at escTSSs shown by read depth and allelic analysis ......................... 73 3.3.8
Significant Xi-biased YY1 occupancy at superloop-associated lncRNAs ............... 74 3.4
Discussion ........................................................................................................................ 78 3.5
Methods............................................................................................................................ 81 3.5.1
Public datasets ........................................................................................................... 81 3.5.2
Classification and differential expression of the sexes using CAGE data ................ 82 3.5.3
Similarity and differential analysis of DNA methylation between sexes ................. 83 3.5.4
Motif over-representation tests of escTSSs using CAGEd-oPOSSUM ................... 84 3.5.5
TF ChIP-seq peak over-representation testing and read depth plots ........................ 85 3.5.6
Allelic ChIP-seq and DNase I data analysis of the female GM12878 cell line ........ 86 Chapter 4: Regulation of gene expression from the perspective of chromatin interaction (Within TAD) ...............................................................................................................................88 x 4.1
Background ...................................................................................................................... 88 4.2
Results: Highly interactive regulatory domain revealed at the PAX6 locus. ................... 89 4.3
Discussion ........................................................................................................................ 93 4.4
Methods............................................................................................................................ 93 4.4.1
Chromatin interaction from Hi-C dataset ................................................................. 93 4.4.2
Local clustering approach to identify highly interactive domains ............................ 94 Chapter 5: Spreading of heterochromatin in X;autosome translocated cells: Regulation at local and domain levels ................................................................................................................96 5.1
Synopsis ........................................................................................................................... 96 5.2
Background ...................................................................................................................... 97 5.3
Results ............................................................................................................................ 102 5.3.1
DNAm spreads into autosomal sequences in unbalanced t(X;A)s ......................... 102 5.3.2
CpG island DNAm changes with distance from TSS ............................................. 103 5.3.3
DNAm analysis predicts varied degrees of inactivation spread between t(X;A)s .. 105 5.3.4
DNA sequence features differ around subject and escape genes ............................ 110 5.3.5
Heterochromatic marks are enriched at subjects in non-translocated cells ............ 114 5.3.6
Genes segregate into topologi