DNA WALKS IN VIRUS GENOMICS
Keywords:
virus RNAs, DNA walks, metric-based binary walk algorithm, ATG walk, SARS-CoV-2 virus, MERS-CoV virus, Dengue virus, Ebola virusDOI:
https://doi.org/10.17654/0973514324017Abstract
This paper studies published results in imaging and digital processing of virus RNAs (ribonucleic acid) using DNA (deoxyribonucleic acid) walks. The complicated nature and physicochemical properties of these nucleotide chains hinder the development of a universal method of numerical mapping and plotting of RNAs, and many algorithms that exist are reviewed here, including 2-D and 3-D DNA walks, walks in complex space, multi-dimensional dynamic representations of DNAs, etc. A detailed analysis is performed for a recently proposed query-walk algorithm and multi-level graphical representation of the traces of repeated patterns in RNA chains. They are represented by binary strings and compared with a sought query, calculating the Hamming distance in every comparison step. The coordinates of the found patterns or queries are defined, and a walk is composed of a set of consecutive numbers of these queries along the studied RNA. The primary attention of this review is paid to ATG triplets, which are starting nucleotides of codons (words) in most cases. As follows from the analyzed papers, the severe mutations of viruses touch the compactness of ATG curve sets of viruses and de-cluster the fractal dimension values of word-length distributions. The material of this review is helpful in the digital and visual studies of viruses.
Received: November 6, 2023
Revised: December 22, 2023
Accepted: February 16, 2024
References
H. Fletcher and I. Hickey, Genetics, 4th ed., Garland Science, 2013.
G. Meister, RNA Biology: An Introduction, Wiley-VCH, 2011.
C. Nello and M. Hahn, Introduction to Computational Genomics: A Case Studies Approach, University Press Cambridge, 2012.
A. Pinho, S. Garcia, D. Pratas and P. J. S. G. Ferreira, DNA sequences at a glance, Plos One 8 (2013), e79922(1-11).
GenBank [www.ncbi.nlm.nih.gov/genbank].
Global Initiative on Sharing All Influenza Data (GISAID) [www.gisaid.org].
J. Blayney et al., Super-enhancers include classical enhancers and facilitators to fully activate gene expression, Cell 186 (2023), 5826-5839.
W. Li, T. Marr T and K. Kaneko, Understanding long-range correlations in DNA sequences, Phys. D 75 (1994), 392-416.
G. Villani, Affinity and correlation in DNA, Multidisciplinary Sci. J. 5 (2022), 214-231.
J. Berger, S. Mitra, M. Carli and A. Neri, Visualization and analysis of DNA sequences using DNA walks, J. Franklin Inst. 341 (2004), 37-53.
M. Tibatan and M. Sarisaman, Unitary structure of palindromes in DNA, Biosystems 211 (2022), 104565(1-8).
P. Vaidyanathan, Genomics and proteomics: a signal processor’s tour, IEEE Circ. Syst. Mag. 4 (2004), 7-29.
J. Lorenzo-Ginori, A. Rodriguez-Fuentes, R. Abalo and R. S. Rodriguez, Digital signal processing in the analysis of genomic sequences, Current Bioinformatics 4 (2009), 28-40.
A. Belinsky and G. Kouzaev, Visual and quantitative analyses of virus genomic sequences using a metric-based algorithm, WSEAS Trans. Circ. Syst. 21 (2022), 323-348.
A. Belinsky and G. Kouzaev, Geometrical study of virus RNA sequences, BioRxiv preprint: 2021.09.06.459135.
https://doi.org/10.1101/2021.09.06.459135; Europe PMC: PPR: PPR391263.
G. Kouzaev, The geometry of ATG-walks of the Omicron SARS-CoV-2 virus RNAs, BioRxiv preprint: https://doi.org/10.1101/2021.12.20.473613; Europe PMC: PPR: PPR435860.
H. Kwan and S. Arniker, Numerical representation of DNA sequences, Proc. 2009 IEEE Int. Conf., Electro/Information Technology, Windsor, ON, Canada, 2009, pp. 307-310.
C. Cattani, Complex representation of DNA sequences, M. Elloumi et al., eds., Bioinformatics Research and Development, BIRD 2008, Communications in Computer and Information Science, Vol. 13, Springer, 2008, pp. 528-537.
E. Hamori and J. Raskin, Curves, a novel method of representation of nucleotide series especially suited for long DNA sequences, J. Biol. Chem. 258 (1983), 1318-1327.
M. Gates, Simpler DNA sequence representations, Nature 316 (1985), 219.
R. Voss, Evolution of long-range fractal correlations and noise in DNA sequences, Phys. Rev. Lett. 68 (1992), 3805-3808.
A. Nandy, A new graphical representation and analysis of DNA sequence structure, I. Methodology and applications to globin genes, Curr. Sci. 66 (1994), 309-314.
A. Nandy, Two-dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences, Cabios 12 (1996), 55-62.
P. Leong and S. Morgenthaaler, Random walk and gap plots of DNA sequences, Comput. Appl. Biosci. 11 (1995), 503-507.
B. Hewelt et al., The DNA walk and its demonstration of deterministic chaos-relevance to genomic alterations in lung cancer, Bioinformatics 35 (2019), 2738-2748.
A. Nandy et al., Characterizing the Zika virus genome - a bioinformatics study, Curr. Comp. Aided Drug Design 12 (2016), 87-97.
S. S. T. Yau et al., DNA sequence representation without degeneracy, Nucleic Acids Res. 31 (2003), 3078-3080.
C. Yu, M. Deng and S. S. T. Yau, DNA sequence comparison by a novel probabilistic method, Inform. Sci. 181 (2011), 1484-1492.
T. Cover and J. Thomas, Elements of Information Theory, J. Wiley and Sons, 1991.
J. Berger et al., New approaches to genome sequence analysis based on digital signal processing, Proc. Workshop on Genomic Signal Processing and Statistics (GENSIPS), IEEE, Raleigh, North Carolina, USA, 11-13 Oct. 2002, CP2-08. 2002, pp. 1-4.
P. Cristea, Conversion of nucleotide sequences into genomic signals, J. Cell. Mol. Med. 6 (2002), 279-303.
L. Das, S. Nanda and J. Das, An integrated approach for identification of exon locations using recursive Gauss Newton tuned Kaiser window, Genomics 111 (2019), 284-296.
A. Brodzik and O. Peters, Symbol-balanced quaternion periodicity transform for latent pattern detection in DNA sequences, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP’05), 2005, Philadelphia, PA, USA, 2005, Vol. 5, pp. v/373-v/376.
Z. J. Zang, DV-curve: a novel intuitive tool for visualizing and analyzing DNA sequences, Bioinformatics 25 (2009), 1112-1117.
A. Nandy, M. Harle and S. Basak, Mathematical descriptors of DNA sequences: development and applications, Arkivoc (2006), 211-238.
H. Kwan and S. Arniker, Numerical representation of DNA sequences, Proc. 2009 IEEE Int. Conf. Electro/Inf. Technol., Windsor, ON, Canada, 2009, pp. 307-310.
M. Randic, M. Novic and D. Plavsic, Milestones in graphical bioinformatics, Int. J. Quantum Chem. 113 (2013), 2413-2446.
V. Aram, A. Iranmanesh and Z. Majid, Spider representations of DNA sequences, J. Comput. Theor. Nanoscience 11 (2014), 418-420.
Y. Li, Q. Liu and X. Zheng, DUC-curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment, Phys. A 456 (2016), 256-270.
Z. Mo et al., One novel representation of DNA sequence based on the global and local position information, Sci. Rep. 8 (2018), 7592(1-7).
G. S. Xie et al., Graphical representations and similarity analysis of DNA sequences based on trigonometric functions, Acta Biotheor. 66 (2018), 113-133.
B. Lee, Squiggle: a user-friendly two-dimensional DNA sequence visualization tool, Bioinformatics 35 (2018), 1425-1426.
J. Moroz and P. Nelson, Torsional directed walks, entropic elasticity, and DNA twist stiffness, Proc. Natl. Acad. Sci. USA 94 (1997), 14418-14422.
M. Randic, 2-D graphical representation of proteins based on physico-chemical properties of amino acids, Chem. Phys. Lett. 476 (2009), 281-286.
M. Mahmoodi-Reihani, F. Abbasitabar and V. Zare-Shahabadi, A novel graphical representation and similarity analysis of protein sequences based on physicochemical properties, Phys. A 510 (2018), 477-485.
N. Marascio et al., Molecular characterization and cluster analysis of SARS-CoV-2 viral isolates in Kahramanmaras city, Turkey: The Delta VOC wave within one month, Viruses 15 (2023), 802(1-12).
C. Peng et al., Long-range correlations in nucleotide sequences, Nature 356 (1992), 168-170.
A. Grigoriev, Analyzing genomes with cumulative skew diagrams, Nucleic Acids Res. 26 (1998), 2286-2290.
X. Q. Qi, J. Wen and Z. H. Qi, New 3D graphical representation of DNA sequence based on dual nucleotides, J. Theor. Biol. 249 (2007), 681-690.
C. Li et al., Novel graphical representation and numerical characterization of DNA sequences, Appl. Sci. 6 (2016), 63(1-15).
F. Bai et al., Vector representation and its application of DNA sequences based on nucleotide triplet codons, J. Mol. Graph Model. 62 (2015), 150-156.
D. Bielinska-Waz et al., 2D-dynamic representation of DNA sequences, Chem. Phys. Lett. 442 (2007), 140-144.
A. Nandy et al., Characteristics of influenza HA-NA interdependence determined through a graphical technique, Curr. Comuter. Aided Drug Design 10 (2014), 285-302.
A. Nandy and S. Basak, Prognosis of possible reassortments in recent H5N2 epidemic influenza in USA: implication for computer-assisted surveillance as well as drug/vaccine design, Curr. Comput. Aided Drug Design 11 (2015), 110-116.
D. Panas et al., 2D-dynamic representation of DNA/RNA sequences as a characterization tool of the Zika virus genome, MATCH Commun. Math. Comput. Chem. 77 (2017), 321-332.
D. Panas et al., An application of the 2D-dynamic representation of DNA/RNA sequences to the prediction of influenza a virus subtypes, MATCH Commun. Math. Comput. Chem. 80 (2018), 295-310.
P. Waz and D. Bielinska-Waz, 3D-dynamic representation of DNA sequences, J. Mol. Model. 20 (2014), 2141(1-7).
D. Bielinska-Waz, P. Waz and D. Panas, Applications of 2D and 3D-dynamic representations of DNA/RNA sequences for a description of genome sequences of viruses, Comb. Chem. High Throughput Screening 25 (2022), 429-438.
P. Waz and D. Bielinska-Waz, Non-standard bioinformatics characterization of SARS-CoV-2, Comp. Biol. Med. 131 (2021), 104247(1-14).
D. Bielinska-Waz et al., 4D-dynamic representation of DNA/RNA sequences: studies on genetic diversity of Echinococcus multilocularis in red foxes in Poland, Life 12 (2022), 877(1-23).
A. Czernieka et al., 20D-dynamic representations of protein sequences, Genomics 107 (2016), 16-23.
A. Kostadinov and G. Kouzaev, A novel processor for artificial intelligence acceleration, WSEAS Trans. Circ. Systems 21 (2022), 125-141.
B. Brejova, T. Vinar and M. Li, Pattern discovery, Introduction to Bioinformatics, S. Krawetz and D. Womble, eds., Humana Press, 2003, pp. 491-522.
R. Mian, M. Shintani and M. Inoue, Hardware-software co-design for decimal multiplication, Computers 10 (2021), 17(1-19).
N. Brisebarre et al., Comparison between binary and decimal floating-point numbers, IEEE Trans. Comput. 65 (2016), 2032-2044.
Matlab R2020b, version 9.9.0.1477703.
[https://se.mathworks.com/products/matlab.html]
Chapter 2. General Structure, The Unicode Standard (6.0 ed.), The Unicode Consortium: Mountain View, California, US.
R. Hamming, Error detecting and error-correcting codes, Bell. Syst. Techn. J. 29 (1950), 147-160.
W. Waggener, Pulse Code Modulation Techniques, Springer-Verlag, 1995.
G. Navarro and M. Raffinot, Flexible Pattern Matching in Strings: Practical Online Search Algorithms for Texts and Biological Sequences, Cambridge University Press, 2002.
V. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Phys. Doklady 10 (1966), 707-710.
E. Gabidullin, Theory of codes with maximum rank distance, Problemy Peredachi Informatsii (Probl. Inform. Trans.) 21 (1985), 3-16.
E. Polityko, Calculation of distance between strings
https://www.mathworks.com/matlabcentral/fileexchange/17585-calculation-of-distance-between-strings] MATLAB Central File Exchange, Retrieved March 3, 2021.
X. Yang et al., Genetic cluster analysis of SARS-CoV-2 and the identification of those responsible for the major outbreaks in various countries, Emerging Microbes and Infect. 9 (2020), 1287-1299.
J. Tzeng, H. H. S. Lu and W. H. Li, Multi-dimensional scaling for large genomic data sets, BMC Bioinformatics 9 (2008), 179(1-17).
A. Taghavi et al., Evaluating geometric definitions of staking for RNA dinucleoside monophosphates using molecular mechanics calculations, J. Chem. Theory Comput. 18 (2022), 3637-3653.
A. Melkich and A. Khrennikov, Nontrivial quantum and quantum-like effects in biosystems: Unsolved questions and paradoxes, Progress Biophys. Mol. Biol. 119 (2015), 137-161.
J. Feder, Fractals, Plenum Press, 1988.
C. Berthelsen, J. Glazier and M. Skolnick, Global fractal dimension of human DNA sequences treated as pseudorandom walks, Phys. Rev. A 45 (1992), Paper No 89028913.
P. Licinio and R. Caligiorne, Inference of phylogenetic distances from DNA-walk divergences, Phys. A 341 (2004), 471-481.
A. Rosas, E. Nogueira Jr. and J. Fontanari, Multifractal analysis of DNA walks and trails, Phys. Rev. E 66 (2002), 061906(1-6).
A. Haimovich et al., Wavelet analysis of DNA walks, J. Comput. Biol. 13 (2006), 1289-1298.
H. Namazi et al., Diagnosis of skin cancer by correlation and complexity analyses of damaged DNA, Oncotarget 6 (2015), 42623-42631.
G. Abramson, H. Cerdeira and C. Bruschi, Fractal properties of DNA walks, Biosystems 49 (1999), 63-70.
C. Cattani, Fractals and hidden symmetries in DNA, Math. Probl. Eng. 2010 (2010), 507056(1-31).
S. Ouadfeul, Multifractal analysis of SARS-CoV-2 coronavirus genomes using the wavelet transforms, BioRxiv preprint: https://doi.org/10.1101/2020.08.15.252411.
B. Hao, H. C. Lee and S. Zhang, Fractals related to long DNA sequences and complete genomes, Chaos Solitons Fractals 11 (2000), 825-836.
Z. Y. Su, T. Wu and S. Y. Wang, Local scaling and multifractality spectrum analysis of DNA sequences - GenBank data analysis, Chaos Solitons Fractals 40 (2009), 1750-1765.
G. Duran-Meza, J. Lopez-Garcia and J. del Rio-Correa, The self-similarity properties and multifractal analysis of DNA sequences, Appl. Math. Nonlin. Sci. 4 (2019), 267-278.
M. Swapna and S. Sankararaman, Fractal applications in bio-nanosystems, Bioequiv. Availab. 2 (2019), pp. OABB.000541(1-4).
X. Bin, E. Sargent and S. Kelley, Nanostructuring of sensors determines the efficiency of biomolecular capture, Anal. Chem. 82 (2010), 5928-5931.
J. Chen et al., Research progress of DNA walker and its recent applications in biosensor, TrAC Trends in Anal. Chem. 120 (2019), 115626(1-14).
A. Sadana, Engineering Biosensors, Kinetics and Design Application, Acad. Press, 2001.
P. Grassberger and I. Procaccia, Measuring the strangeness of strange attractors, Phys. D 9 (1983), 189-208.
S. Rasband, Chaotic Dynamics of Nonlinear Systems, Dover Publications, 2015.
B. Henry, N. Lovell and F. Camacho, Nonlinear dynamics time series analyses, Nonlinear Biomedical Signal Processing: Dynamic Analysis and Modeling, M. Akay, ed., IEEE Press, 2000, pp. 1-39.
F. Roueff and J. Vehel, A regularization approach to fractional dimension estimation, Proc. Int. Conf. Fractals 98, Oct. 1998, Valletta, Malta. World Sci., 1998, pp. 1-14.
J. Vehel and P. Legrand, Signal and image processing with Fraclab, Thinking in Patterns, World Sci. (2003), 321-322.
G. Kouzaev, Application of Advanced Electromagnetics, Components and Systems, Springer-Verlag, 2013.
C. Guidolin et al., Does a self-similarity logic shape the organization of the nervous system? The Fractal Geometry of the Brain, A. Di Leva, ed., Springer- Verlag, 2016, pp. 138-156.
FracLab 2.2. A Fractal Analysis Toolbox for Signal and Image Processing. [www.project.inria.fr/fraclab]
X. H. Xie et al., A novel genome signature based on inter-nucleotide distances profiles for visualization of metagenomic data, Phys. A 482 (2017), 87-94.
X. Yang et al., Genetic cluster analysis of SARS-CoV-2 and the identification of those responsible for the major outbreaks in various countries, Emerging Microbes and Infect. 9 (2020), 1287-1299.
C. Cao et al., The architecture of the SARS-CoV-2 RNA genome inside virion, Nature Commun. 12 (2021), 3917(1-14).
A. Brant et al. SARS-CoV-2: from its discovery to genome structure, transcription, and replication, Cell and Bioscience 11 (2021), 136(1-17).
C. Wu et al., Structure genomics of SARS-CoV-2 and its Omicron variant: drug design templates for COVID-19, Acta Pharm. Sinica 43 (2022), 3021-3033.
V. Cooper, The coronavirus variants do not seem to be highly variable so far, Sci. American, 2021.
S. El-Kafrawy et al., Enzootic patterns of Middle East respiratory syndrome coronavirus in imported African and local Arabian dromedary camels: a prospective genomic study, The Lancet Planetary Health 3 (2019), e521-e528.
M. Kim et al., An infectious cDNA clone of a growth attenuated Korean isolate of MERS coronavirus KNIH002 in clade B, Emerg. Microbes Infect. 9 (2020), 2714-2720.
V. Dwivedi et al., Genomics, proteomics and evolution of dengue virus, Briefings in Functional Genomics 16 (2017), 217-227.
H. Abea et al., Re-emergence of Dengue virus serotype 3 infections in Gabon in 2016-2017, and evidence for the risk of repeated Dengue virus infections, Int. J. Infect. Diseases 91 (2020), 129-136.
N. Di Paola et al., Viral genomics in Ebola virus research, Nature Rev. Microbiol. 8 (2020), 365-378.
J. Zhang, Visualization for Information Retrieval, Springer-Verlag, 2007.
M. Vracko et al., Cluster analysis of coronavirus sequences using computational sequence descriptors: with applications for SARS, MERS and SARS-CoV-2 (CoVID-19), Curr. Comput. Aided Drug Design 17 (2021), 936 945.
V. Grishkevich and I. Yanai, Gene length and expression level shape genomic novelties, Genome Research 24 (2014), 1497-1503.
T. Stoeger et al., Aging is associated with a systemic length-associated transcriptome imbalance, Nature Aging 2 (2022), 1191-1206.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 PUSHPA PUBLISHING HOUSE, PRAYAGRAJ, INDIA

This work is licensed under a Creative Commons Attribution 4.0 International License.
_________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
Non-Commercial Use: For non-commercial purposes only. No commercial activities without explicit permission.
No Derivatives: Modifying or creating derivative works not allowed without written permission.
Contact Puspha Publishing House for more info or permissions.
Journal Impact Factor: 


Google h-index: 10