Open Access Open Access  Restricted Access Subscription or Fee Access

An Overview of Biological Databases Used in Bioinformatics

Aniket Sharma

Abstract


Biological databases are accumulations of information in the life sciences assembled from empirical literature, slightly elevated experimentation, public literature, and computational analysis. It incorporates data from the disciplines of genomics, microarray gene expression, proteomics, phylogenetics, and metabolomics in addition to details on the configuration, localization, and function of genes together with commonalities between biological sequences. In an essence, databases are accumulations of biological evidence gathered from the scientific community and employed to represent knowledge. Websites that evaluate data make the most complete and accurate biological databases readily available to users who could really browse the documentation electronically. The volume of biological data has expanded tremendously as a by product of the huge volumes of information manufactured by elevated DNA sequencers used to investigate the genome, transcriptome, and exome sequences of various organisms. It is now important to balance, store, and fetch this tremendous percentage of biological data since it is readily available (both sequences and structurally). This article describes current awareness of the various types of databases that seem to be accessible together with descriptions of their file formats.


Full Text:

PDF

References


Cannataro M, Guzzi PH, Tradigo G, Veltri P. Biological databases. InSpringer Handbook of Bio-/Neuroinformatics 2014 (pp. 431-440). Springer, Berlin, Heidelberg.

Mukhopadhyay CS, Tyagi A, Dubey PP. e-compendium of lectures.

Zou D, Ma L, Yu J, Zhang Z. Biological databases for human research. Genomics, proteomics & bioinformatics. 2015 Feb 1;13(1):55–63.

Song EJ, Lee ES, Nam YD. Progress of analytical tools and techniques for human gut microbiome research. Journal of Microbiology. 2018 Oct;56(10):693–705.

Kulikova T, Akhtar R, Aldebert P, Althorpe N, Andersson M, Baldwin A, Bates K, Bhattacharyya S, Bower L, Browne P, Castro M. EMBL nucleotide sequence database in 2006. Nucleic acids research. 2007 Jan 1;35(suppl_1):D16–20.

Baxevanis AD, Bader GD, Wishart DS, editors. Bioinformatics. John Wiley & Sons; 2020 May 12.

Kozomara A, Birgaoanu M, Griffiths-Jones S. miRBase: from microRNA sequences to function. Nucleic acids research. 2019 Jan 8;47(D1):D155–62.

Bernstein FC, Koetzle TF, Williams GJ, Meyer Jr EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. Journal of molecular biology. 1977 May 25;112(3):535–42.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic acids research. 2000 Jan 1;28(1):235–42.

Murvai J, Vlahoviˇcek K, Pongor S. A simple probabilistic scoring method for protein domain identification. Bioinformatics. 2000 Dec 1;16(12):1155–6.

Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic acids research. 2000 Jan 1;28(1):45–8.

Galperin MY, Koonin EV. Comparative Genomics Approaches to Identifying Functionally Related Genes. Algorithms for Computational Biology. 2014:1.

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic acids research. 2010 Nov 10;39(suppl_1):D32–7.

Brooksbank C, Bergman MT, Apweiler R, Birney E, Thornton J. The european bioinformatics institute’s data resources 2014. Nucleic acids research. 2014 Jan 1;42(D1):D18–25.

Kosuge T, Mashima J, Kodama Y, Fujisawa T, Kaminuma E, Ogasawara O, Okubo K, Takagi T, Nakamura Y. DDBJ progress report: a new submission system for leading to a correct annotation. Nucleic acids research. 2014 Jan 1;42(D1):D44–9.

Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R. WormBase 2014: new views of curated biology. Nucleic acids research. 2014 Jan 1;42(D1):D789–93.

Zhang Z, Sang J, Ma L, Wu G, Wu H, Huang D, Zou D, Liu S, Li A, Hao L, Tian M. RiceWiki: a wiki-based database for community curation of rice genes. Nucleic acids research. 2014 Jan 1;42(D1):D1222–8.

Kodama Y, Shumway M, Leinonen R. The Sequence Read Archive: explosive growth of sequencing data. Nucleic acids research. 2012 Jan 1;40(D1):D54–6.

Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR. RefSeq: an update on mammalian reference sequences. Nucleic acids research. 2014 Jan 1;42(D1):D756–63.

Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic acids research. 2012 Jan 1;40(D1):D1202–10.

Ma L, Li A, Zou D, Xu X, Xia L, Yu J, Bajic VB, Zhang Z. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic acids research. 2015 Jan 28;43(D1):D187–92.


Refbacks

  • There are currently no refbacks.