Open Access Open Access  Restricted Access Subscription or Fee Access

Novel computational approaches for finding ORI

Souradipto Choudhuri, Aryaka Bhunia, Krishnanu Choudhuri, Satarupa Deb Sinha

Abstract


Origin of replication (ORI) is the site on the DNA where replication initiates. It is a small region of nucleotides generally recognized by origin recognition complex (ORC). A deep knowledge of ORI will help in discovering the features of regulatory protein, regulatory protein interaction etc. The distribution of ORI over the sequence can also help predict the 3D structure of the gene. Earlier, with a short dataset, ORI were discovered by in vitro techniques. With exponential growth of the dataset, and integration of computer knowledge and statistics, the techniques changed from in vitro to in silico. Guanine-Cytosine (GC) profiling and Guanine-Cytosine (GC) skew scores possessed adequate statistical significance for finding ORI computationally. This paper sheds light on the cause and statistical significance of the GC skew score. Next, finding the ORI of a simple prokaryote (modeling E. coli) is discussed by means of only GC skew score. In contrast, eukaryotic cells have a complex genetic structure, and may contain several ORIs. In the case of eukaryotes, ORI was found modeling S. cerevisiae. Some new attributes like information redundancy and correlation of nucleotides along with their statistical significance are discussed later. But, S. cerevisiae is a simpler eukaryote compared to higher class organisms viz. D. melanogaster. Therefore a compact dataset containing ORI data of seven eukaryotic organisms was collected for a training set. Supervised machine learning methods were used for training the neural networks followed by performance evaluation for the formation of iORI-Euk. At last, this paper tries to shed light on the implementation of iORI-Euk web-tool for finding ORI in the eukaryotic genome.

Keywords


ORI, GC skew, Cumulative GC score, structural analysis, iORI-Euk

Full Text:

PDF

References


. Wagner, E. K., Hewlett, M. J., Bloom, D. C., & Camerini, D. (2007). Basic Virology (3rd ed.) [E-book]. Wiley-Blackwell.

. Nature. (n.d.). Replication | Learn Science at Scitable. Scitable by Nature Education. Retrieved February 2, 2022, from https://www.nature.com/scitable/definition/replication-33/ [3]. Okazaki, R., Okazaki, T., Sakabe, K., Sugimoto, K., & Sugino, A. (1968). Mechanism of DNA chain growth. I. Possible discontinuity and unusual secondary structure of newly synthesized chains. Proceedings of the National Academy of Sciences, 59(2), 598–605. https://doi.org/10.1073/pnas.59.2.598

. Kumar, S., Chinnusamy, V., & Mohapatra, T. (2018). Epigenetics of Modified DNA Bases: 5-Methylcytosine and Beyond. Frontiers in Genetics, 9. https://doi.org/10.3389/fgene.2018.00640

. Breiling, A., & Lyko, F. (2015). Epigenetic regulatory functions of DNA modifications: 5-methylcytosine and beyond. Epigenetics & Chromatin, 8(1). https://doi.org/10.1186/s13072-015-0016-6

. Compeau, P. (2022). BIOINFORMATICS ALGORITHMS,VOL.I (2nd ed., Vol. 1). Active

Learning Publishers.

. Lobry, J. R. (1996). Asymmetric substitution patterns in the two DNA strands of bacteria. Molecular Biology and Evolution, 13(5), 660–665. https://doi.org/10.1093/oxfordjournals.molbev.a025626

. Grigoriev, A. (1998). Analyzing genomes with cumulative skew diagrams. Nucleic Acids Research, 26(10), 2286–2290. https://doi.org/10.1093/nar/26.10.2286

. Gao, F., & Zhang, C. T. (2008). Ori-Finder: A web-based system for finding oriC s in unannotated bacterial genomes. BMC Bioinformatics, 9(1).

https://doi.org/10.1186/1471-2105-9-79

. Luo, H., Zhang, C. T., & Gao, F. (2014). Ori-Finder 2, an integrated tool to predict replication origins in the archaeal genomes. Frontiers in Microbiology, 5. https://doi.org/10.3389/fmicb.2014.00482

. Chen, W., Feng, P., & Lin, H. (2012). Prediction of replication origins by calculating DNA structural properties. FEBS Letters, 586(6), 934–938. https://doi.org/10.1016/j.febslet.2012.02.034

. Li, W. C., Zhong, Z. J., Zhu, P. P., Deng, E. Z., Ding, H., Chen, W., & Lin, H. (2014).

Sequence analysis of origins of replication in the Saccharomyces cerevisiae genomes.

Frontiers in Microbiology, 5. https://doi.org/10.3389/fmicb.2014.00574

. Zhang, C. T., Gao, F., & Zhang, R. (2005). Segmentation algorithm for DNA sequences. Physical Review E, 72(4). https://doi.org/10.1103/physreve.72.041917

. Liu, B., Weng, F., Huang, D. S., & Chou, K. C. (2018). iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC. Bioinformatics, 34(18), 3086–3093. https://doi.org/10.1093/bioinformatics/bty312

Dao, F. Y., Lv, H., Zulfiqar, H., Yang, H., Su, W., Gao, H., Ding, H., & Lin, H. (2020). A computational platform to identify origins of replication sites in eukaryotes. Briefings in Bioinformatics, 22(2), 1940–1950. https://doi.org/10.1093/bib/bbaa017




DOI: https://doi.org/10.37628/ijcbb.v8i1.749

Refbacks

  • There are currently no refbacks.