Assigning and visualizing germline genes in antibody repertoires

@article{Frost2015AssigningAV,
  title={Assigning and visualizing germline genes in antibody repertoires},
  author={Simon D. W. Frost and B. Murrell and A S Md Mukarram Hossain and Gregg J. Silverman and Sergei L Kosakovsky Pond},
  journal={Philosophical Transactions of the Royal Society B: Biological Sciences},
  year={2015},
  volume={370}
}
Identifying the germline genes involved in immunoglobulin rearrangements is an essential first step in the analysis of antibody repertoires. Based on our prior work in analysing diverse recombinant viruses, we present IgSCUEAL (Immunoglobulin Subtype Classification Using Evolutionary ALgorithms), a phylogenetic approach to assign V and J regions of immunoglobulin sequences to their corresponding germline alleles, with D regions assigned using a simple pairwise alignment algorithm. We also… 
Immunoglobulin gene conversion identification and analysis
TLDR
GECCO is presented, the first dedicated gene conversion identification tool for immunoglobulins based on modified, simultaneous, pairwise alignments to host and donor references and has high recall, low false positive rate, and is insensitive to somatic mutations.
BRILIA: Integrated Tool for High-Throughput Annotation and Lineage Tree Assembly of B-Cell Repertoires
TLDR
B-cell repertoire inductive lineage and immunosequence annotator (BRILIA), an algorithm that leverages repertoire-wide sequencing data to globally improve the VDJ annotation coverage, lineage tree assembly, and SHM identification, and it is shown that the complete gene usage annotation andSHM identification across the entire CDR3 are essential for studying the B-cell affinity maturation process through Immunosequencing methods.
Analyzing Immunoglobulin Repertoires
TLDR
Current methods and challenges of library preparation, sequencing and statistical analysis of lymphocyte receptor repertoire studies, and the emphasis of this review is on Ig/BCR sequence analysis are reviewed.
repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
TLDR
A Hidden Markov model is presented, which accounts for all plausible scenarios that can generate the receptor sequences and can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire.
DSab-origin: a novel IGHD sensitive VDJ mapping method and its application on antibody response after influenza vaccination
TLDR
This work filled in a computational gap in D segment assignment for VDJ germline gene identification in antibody research by presenting a D-sensitive mapping method called DSab-origin with accuracies around 90% in human monoclonal antibody data and average 95.8% in mouse data.
Unbiased RACE-Based Massive Parallel Surveys of Human IgA Antibody Repertoires.
TLDR
A duplexing antisense constant region primer is designed that efficiently amplifies, side-by-side, heavy chain transcripts of both the IgA1 and IgA2 subclasses that will be used to investigate the effects of microbial virulence factors on host defenses, during autoimmune responses, and in B-cell malignancies.
How repertoire data are changing antibody science
TLDR
The many ways in which BCR repertoire data have been or could be exploited are discussed, highlighting its utility for providing insights into how the naive immune repertoire is generated and how it responds to antigens.
Practical guidelines for B-cell receptor repertoire sequencing analysis
TLDR
Practical guidelines for B-cell receptor repertoire sequencing analysis are provided, starting from raw sequencing reads and proceeding through pre-processing, determination of population structure, and analysis of repertoire properties.
Likelihood-Based Inference of B Cell Clonal Families
TLDR
An agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages are described that under simulation greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets.
Antibody Upstream Sequence Diversity and Its Biological Implications Revealed by Repertoire Sequencing
TLDR
These findings would facilitate Rep-Seq primer design for capturing antibodies comprehensively and efficiently as well as provide a valuable resource for antibody engineering and the studies of antibody at the molecular level.
...
...

References

SHOWING 1-10 OF 49 REFERENCES
Clustering-based identification of clonally-related immunoglobulin gene sequence sets
TLDR
This work has developed and implemented an algorithm for identifying sets of clonally-related sequences in large human immunoglobulin heavy chain gene variable region sequence sets and provided a more accurate and considerably faster identification ofClonalRelate gene sequences than visual inspection by domain experts.
Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset
TLDR
This work has analyzed thousands of VDJ rearrangements from an individual (S22) whose IGHV, IGHD and IGHJ genotype can be inferred from the dataset, and evaluated the performance of seven utilities.
iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences
TLDR
iHMMune-align provides a more accurate identification of component germline genes than other currently available IGH gene characterization programs, according to an evaluation of other immunoglobulin gene alignment utilities.
IgBLAST: an immunoglobulin variable domain sequence analysis tool
TLDR
The sequence analysis tool IgBLAST is developed, which has the capability to analyse nucleotide and protein sequences and can process sequences in batches and allows searches against the germline gene databases and other sequence databases simultaneously to minimize the chance of missing possibly the best matching germline V gene.
Automated analysis of immunoglobulin genes from high-throughput sequencing: life without a template
TLDR
The analysis pipeline presented here is highly modular, and makes it possible to analyze the data resulting from high-throughput sequencing of immunoglobulin genes, in spite of the lack of a template gene.
Ab-origin: an enhanced tool to identify the sourcing gene segments in germline for rearranged antibodies
TLDR
Ab-origin is presented, a program designed by batch query against germline databases based on empirical knowledge, optimized scoring scheme and appropriate parameters, which outperformed all the other five popular tools in terms of prediction accuracy.
SoDA: implementation of a 3D alignment algorithm for inference of antigen receptor recombinations
TLDR
A dynamic programming algorithm to perform reconstruction of the details of the recombinatorial process giving rise to each of the participating antigen receptor genes is developed and implemented as web-accessible software called SoDA (Somatic Diversification Analysis).
SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements
TLDR
A probabilistic model of the rearrangement process and a Bayesian method for estimating posterior probabilities for the comparison of multiple plausible rearrangements are developed.
An Evolutionary Model-Based Algorithm for Accurate Phylogenetic Breakpoint Mapping and Subtype Prediction in HIV-1
TLDR
This work presents a model-based phylogenetic method for automatically subtyping an HIV-1 (or other viral or bacterial) sequence, mapping the location of breakpoints and assigning parental sequences in recombinant strains as well as computing confidence levels for the inferred quantities.
Models of Somatic Hypermutation Targeting and Substitution Based on Synonymous Mutations from High-Throughput Immunoglobulin Sequencing Data
TLDR
Improved models of SHM targeting and substitution that are based only on synonymous mutations, and are thus independent of selection are produced.
...
...