Catherine M. Farrell

Learn More
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration ( We report here on growth of the mammalian and human(More)
Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks(More)
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records ( The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC)(More)
The Consensus Coding Sequence (CCDS) project ( is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations(More)
In mouse and human, the beta-globin genes reside in a linear array that is associated with a positive regulatory element located 5' to the genes known as the locus control region (LCR). The sequences of the mouse and human beta-globin LCRs are homologous, indicating conservation of an essential function in beta-globin gene regulation. We have sequenced(More)
A binding site for the transcription factor CTCF is responsible for enhancer-blocking activity in a variety of vertebrate insulators, including the insulators at the 5' and 3' chromatin boundaries of the chicken beta-globin locus. To date, no functional domain boundaries have been defined at mammalian beta-globin loci, which are embedded within arrays of(More)
The sequencing of the complete genomes of several organisms, including humans, has so far not contributed much to our understanding of the mechanisms regulating gene expression in the course of realization of developmental programs. In this so-called "postgenomic" era, we still do not understand how (if at all) the long-range organization of the genome is(More)
In order to create an extended map of chromatin features within a mammalian multigene locus, we have determined the extent of nuclease sensitivity and the pattern of histone modifications associated with the mouse beta-globin genes in adult erythroid tissue. We show that the nuclease-sensitive domain encompasses the beta-globin genes along with several(More)
Insulators are DNA sequence elements that can serve in some cases as barriers to protect a gene against the encroachment of adjacent inactive condensed chromatin. Some insulators also can act as blocking elements to protect against the activating influence of distal enhancers associated with other genes. Although most of the insulators identified so far(More)
The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes,(More)