Open Access Open Access  Restricted Access Subscription or Fee Access

Next Generation Genome Re-sequencing: Tools for Identification and Validation of Variants

Divya Verma

Abstract


Whole genome assemblies for most of the model organisms along with Homo sapiens are already available; therefore, the short reads generated by the NGS methods can be easily mapped against and compared with these available reference sequences. In the last few years, the NGS technologies have evolved a lot, in performance, read length, accuracy, applications, man power requirement, consumables, informatics infrastructure; facilitating a wide variety of applications. One of the important downstream application of fast evolving NGS technology is SNP and genotype calling, including simple nucleotide variations (SNVs), structural variations (SVs) and copy number variations (CNVs). The NGS data analysis for variant detection is a multistep process requiring diverse type of tools for different stages of the workflow depending upon the specific application. It is not trivial to select a proper tool for each stage of the application. A comprehensive workflow along with different tools for all individual analysis steps has been discussed in this review for variant recognition.

Full Text:

PDF

References


Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain terminating inhibitors. ProcNatlAcadSci USA. 1977; 74: 5463–67p.

Ronaghi M, Karamohamed S, Pettersson B, et al. Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem. 1996; 242, 84–9p.

Margulies M, Egholm M, Altman WE, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437, 376–80p.

Turcatti G, Romieu A, Fedurco M et al. A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis. Nucleic Acids Res. 2008; 36, e25p.

McKernan K, Blanchard A, Kotler L et al. Reagents, methods, and libraries for bead-based sequencing. US patent application 20080003571. 2006.

Morey M, Fernández-Marmiesse A, Castiñeiras D, et al.A glimpse into past, present, and future DNA sequencing. Mol Genet Metab. 2013; 110(1-2): 3-24p.

Thudi M, Li Y, Jackson SA, et al. Current state-of-art of sequencing technologies for plant genomics research. Brief Funct Genomics. 2012; 11(1): 3-11p.

Shendure J, Ji H. Next-generation DNA sequencing. Nature Biotechnology. 2008; 26: 1135–45p.

Lin L, Yinhu L, Siliang L, et al. Comparison of Next-Generation Sequencing Systems, Jour Biomed and Biotech. 2012; 251364: 11p.

Brenner CH, Weir BS. Issues and strategies in the DNA identification of World Trade Center victims. Theoretical Population Biology. 2003; 63(3): 173–8p.

Liu ZJ, Cordes JF. DNA marker technologies and their applications in aquaculture genetics. Aquaculture. 2004; 238(1–4): 1–37p.

Yu H, Xie W, Wang J, et al. Gains in QTL detection using an ultra-high density SNP map based on population sequencing relative to traditional RFLP/SSR markers. PLoS ONE. 2011;6(3)e17595.

Seddon JM, Parker HG, Ostrander EA, Ellegren H. SNPs in ecological and conservation studies: a test in the Scandinavian wolf population. Molecular Ecology. 2005; 14(2): 503–11p.

Lu T, Lu G, Fan D, et al. Function annotation of the rice transcriptome at single-nucleotide resolution by RNA-seq. Genome Res. 2010; 20(9): 1238-49p.

Barbazuk WB, Emrich S, Schnable PS. SNP mining from maize 454 EST sequences. Cold Spring Harbor Protocols. 2007.

Trick M, Long Y, Meng J, et al. Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencing. Plant Biotechnol J. 2009; 7(4): 334-46p.

Yang SS1, Tu ZJ, Cheung F, et al. Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems. BMC Genomics. 2011;12: 199p.

Ozsolak F, Platt AR, Jones DR, et al. Direct RNA sequencing. Nature. 2009; 461(7265): 814–8p.

Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998; 8: 186–94p.

Quinlan AR et al. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nature Methods 2008; 5: 179–81p.

Wu H, Irizarry RA, Bravo HC. Intensity normalization improves color calling in SOLiD sequencing. Nature Methods 2010; 7: 336–7p.

Kircher M, Stenzel U, Kelso J. Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol. 2009; 10: R83p.

Kao WC, Stevens K, Song YS. Bayescall: a model-based base calling algorithm for high throughput short-read sequencing. Genome Res. 2009; 19, 1884–95p.

Kao WC, Song YS. Naivebayescall: an efficient model-based base-calling algorithm for high throughput sequencing. Lect. Notes Comp. Sci. 2010; 6044: 233–47p.

Rosenbloom KR, Dreszer TR, et al. ENCODE whole-genome data in the UCSC genome browser (2012 update). Nucleic Acids Res. 2011; 39: D871–5p.

Nielsen R, Paul JS, Albrechtsen A, et al. Genotype and SNP calling from next-generation sequencing data. Nature Reviews; 2011; 12: 443–51p.

Burrows M, Wheeler D. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation. HP Labs Technical. 1994.

Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10: R25p.

Li R, Yu C, Li Y, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009; 25: 1966–7p.

Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26: 589–95p.

Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008; 18: 1851–8p.

Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011; 21: 936–9p.

Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002; 12: 656–64p.

Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res. 2001; 11: 1725–9p.

Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinformatics. 2010; 11: 473–83p.

Ruffalo M, LaFramboise T, Koyutu«rk M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011; 27: 2790–6p.

Yu X, Guda K, Willis J, et al. How do alignment programs perform on sequencing data with varying qualities and from repetitive regions? Bio Data Mining. 2012; 5: 6p.

Krawitz P, Rödelsperger C, Jäger M, et al. Microindel detection in short-read sequence data. Bioinformatics. 2010; 26(6): 722–9p.

Trebbi D, Maccaferri M, de Heer P, et al. High-throughput SNP discovery and genotyping in durum wheat (Triticum durum Desf.) Theoretical and Applied Genetics. 2011; 123(4): 555–69p.

Allen AM, Barker GL, Berry ST et al. Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnology Journal. 2011; 9: 9, 1086–99p.

Waugh R, Jannink JL, Muehlbauer GJ, et al. The emergence of whole genome association scans in barley. Current Opinion in Plant Biology. 2009; 12(2): 218–22p.

Oliver RE, Lazo GR, Lutz JD, et al. Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology. BMC Genomics. 2011; 12(1, article 77).

Cortés AJ, Chavarro MC, Blair MW. SNP marker diversity in common bean (Phaseolus vulgaris L.) Theoretical and Applied Genetics. 2011; 123(5): 827–45p.

Kumar S, Banks TW, Cloutier S. SNP Discovery through Next-Generation Sequencing and Its Applications. Internat Jour of Plant Genomics. 2012; 831460, 15p.

Altshuler D, Pollara VJ, Cowles CR, et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000; 407: 6803, 513–6p.

Berger J, Suzuki T, Senti KA, et al. Genetic mapping with SNP markers in Drosophila. Nature Genetics. 2001; 29: 4, 475–81p.

McNally KL, Childs KL, Bohnert R, et al. Genome wide SNP variation reveals relationships among landraces and modern varieties of rice. Proceedings of the National Academy of Sciences of the United States of America. 2011;106(30): 12273–8p.

Yamamoto T, Nagasaki H, Yonemaru JI et al. Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single nucleotide

Zhang X, Borevitz JO. Global analysis of allele-specific expression in Arabidopsis thaliana. Genetics. 2009;182(4): 943–954.

Byers RL, Harker DB, Yourstone SM, Maughan PJ, and Udall JA. Development and mapping of SNP assays in allotetraploid cotton. Theoretical and Applied Genetics. 2012; 124(7); 1201–14p.

Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25: 2078–9p.

McKennan A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20: 1297–303p.

DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011; 43(49): 1–8p.

Koboldt DC, Zhang Q, Larson DE, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012; 22: 568–76p.

Bansal V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics. 2010; 26(i3): 18–24p.

Wei Z, Wang W, Hu P, et al. SNVer: a statistical tool for variant calling in analysis of pooled or individual next generation sequencing data. Nucleic Acids Res. 2011; 39: e132p.

Pabinger S, Dander A, Fischer M, et al. A survey of tools for variant analysis of next-generation genome sequencing data. Brief in Bioinfo. 2012.

Sudmant PH, Kitzman JO, Antonacci F, et al. Diversity of human copy number variation and multicopy genes. Science. 2010; 330(6004): 641–6p.

Fanciulli M, Petretto E, Aitman TJ. Gene copy number variation and common human disease. Clin Genet, 2010; 77(3): 201–13p.

Frank B, Bermejo JL, Hemminki K, et al. Copy number variant in the candidate tumor suppressor gene MTUS1 and familial breast cancer risk. Carcinogenesis, 2007; 28(7): 1442–5p.

Teh MT, Gemenetzidis E, Chaplin T, et al. Upregulation of FOXM1 induces genomic instability in human epidermal keratinocytes. Mol Cancer. 2010; 9: 45.

Sebat J, Lakshmi B, Malhotra D, et al. Strong association of de novo copy number mutations with autism. Science. 2007; 316(5823): 445–9p.

Kusenda M, Sebat J. The role of rare structural variants in the genetics of autism spectrum disorders. Cytogenet Genome Res. 2008; 123(1–4): 36–43p.

Kidd JM, Cooper GM, Donahue WF, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008; 453: 56-64p.

Ye K, Schulz M, Long Q, et al. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009; 25: 2865p.

Yoon S, Xuan Z, Makarov V, et al. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009; 9: 1586–92p.

McKernan KJ, Peckham HE, Costa GL, et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 2009; 19: 1527–41.

Abyzov A, Urban AE, Snyder M, et al. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011; 21: 974–84p.

Sathirapongsasuti JF, Lee H, Horst BAJ, et al. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011; 27: 2648–54p.

Medvedev P, Fiume M, Dzamba M, et al. Detecting copy number variation with mated short reads. Genome Research. 2010; 20: 1613p.

Boeva V, Zinovyev A, Bleakley K, et al. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics. 2010.

Li J, Lupat R, Amarasinghe KC, et al. CONTRA: copy number analysis for targeted resequencing. Bioinformatics. 2012; 28: 1307–13p.

Chen K, Wallis JW, McLellan MD, et al. Break Dancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods. 2009; 6: 677–81p.

Hormozdiari F, Hajirasouliha I, Dao P, et al. Next-generation Variation Hunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics. 2010; 26(12): i350-7p.

Hormozdiari F, Alkan C, Eichler EE, et al. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 2009; 19: 1270–8p.

Lee S, Hormozdiari F, Alkan C et al. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods. 2009; 6: 473–4p.

Sindi SS, Onal S, Peng L, et al. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol. 2012; 13: R22p.

Wong K, Keane TM, Stalker J, et al. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biol. 2010; 11: R128p.

Marschall T, Costa I, Canzar S, et al. CLEVER: cliqueenumerating variant finder. Bioinformatics. 2012; 28(22): 2875–88p.

Marschall T, Hajirasouliha I, Schönhuth A. MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels. Bioinformatics. 2013; 29(24): 3143–50p.

Sun R, Love MI, Zemojtel T, et al. Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads. Bioinformatics. 2012; 28: 1024–5p.

Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010; 38:e164p.

Makarov V, O’Grady T, Cai G, et al. AnnTools: a comprehensive and versatile annotation toolkit for genomic variants. Bioinformatics. 2012; 28: 724–5p.

Grant JR, Arantes AS, Liao X, et al. In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics. 2011; 27: 2300–1p.

Ge D, Ruzzo EK, Shianna KV, et al. SVA: software for annotating and visualizing sequenced human genomes. Bioinformatics. 2011; 27:1998–2000p.

Medina I, De Maria A, Bleda M, et al. VARIANT: command line, webservice and web interface for fast and accurate functional characterization of variants found by next-generation sequencing. Nucleic Acids Res. 2012; 40: W54–8p.

Renaud G, Neves P, Folador EL, et al. Segtor: Rapid Annotation of Genomic Coordinates and Single Nucleotide Variations Using Segment Trees. PLoS ONE. 2011; 6(11): e26715.

Yandell M, Huff C, Hu H, et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 2011; 21: 1529–42p.

Li K, Stockwell TB. Variant Classifier: A hierarchical variant classifier for annotated genomes. BMC Res Notes. 2010; 3: 191p.

Durbin, RM et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467: 1061–1073.

Fan JB, Chee MS, Gunderson KL. Highly parallel genomic assays. Nature Reviews Genetics. 2006; 7-8: 632–44p.

Garvin MR, Saitoh K, Gharrett AJ. Application of single nucleotide polymorphisms to non-model species: a technical review. Molecular Ecology Resources. 2010; 10-6: 915–34p.

Schmitt MW, Kennedya SR, Salka JJ, et al. Detection of ultra-rare mutations by next-generation sequencing. ProcNatlAcadSci USA. 2012; 109(36): 14508–13p.

Jia P, Li F, Xia J, et al. Consensus Rules in Variant Detection from Next-Generation Sequencing Data. PLoS ONE. 2012; 7(6): e38470p.




DOI: https://doi.org/10.37628/ijcbb.v1i1.802

Refbacks

  • There are currently no refbacks.