Addressing Algorithmic Challenges in Computational Genomic Epidemiology
- Development of methods for reconstruction of the whole spectrum of viral genetic diversity, including closely related low-frequency variants, from noisy and fragmented next-generation sequencing data.
- Development of algorithms for reconstruction of viral transmission networks (``who infected whom") using genomic data and expected properties of social networks of contacts between susceptible individuals.
- Development of a model-based algorithmic framework for the inference of viral fitness landscapes.
Recent advances in sequencing technologies had a profound effect on viral research by generating enormous amounts of genomic data. An interdisciplinary area of research that uses the analysis of viral genomes to understand how viruses evolve and spread is called genomic epidemiology . It is a young area which computational toolkit is still in development. This process faces many algorithmic challenges, and there are gaps between the analytic capabilities of currently available tools and the demand from biomedical and epidemiological applications. The major challenges include (i) the extraction of weak genomic signal from noisy and fragmented sequencing data; (ii) the need to scale to the levels of ``big data" produced by modern next-generation sequencing platforms; (iii) the algorithmic hardness of fitting complex epidemiological and evolutionary models to the observed data.
The overarching goal of this proposal is to address these challenges} with respect to 3 fundamental problems of computational genomic epidemiology: assessment of viral genetic diversity, reconstruction of virus transmission history and quantification of viral phenotypic diversity. Specific goals include:
- ”SOPHIE: viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework"
P. Skums, F. Mohebbi, V. Tsyvina, P. Icer, Y. Khudyakov.
RECOMB 2022 (accepted)
- ”From Alpha to Zeta: Identifying variants and subtypes of SARS-CoV-2 via clustering"
A. Melnyk, F. Mohebbi, S. Knyazev, B. Sahoo, R. Hosseini, P. Skums, A. Zelikovsky, M. Patterson
Journal of Computational Biology 11 (2021): 1113-1129.
- ”Scalable reconstruction of SARS-CoV-2 phylogeny with recurrent mutations"
D. Novikov, S. Knyazev, M. Grinshpon, P. Icer, P. Skums, A Zelikovsky
Journal of Computational Biology 28(11), 1130-1141
- ”Investigating the first wave of the COVID-19 pandemic in Ukraine using epidemiological and genomic sequencing data"
Y. Gankin, V. Koniukhovskii, A. Nemira, G. Chowell, T. A. Weppelmann, P. Skums, A. Kirpich
Infection, Genetics and Evolution, 95:105087
- ”SARS-CoV-2 transmission dynamics in Belarus in 2020 revealed by genomic and incidence data analysis"
A. Nemira, A. E. Adeniyi, E. L. Gasich, K. Y.Bulda, L. N. Valentovich, A. G. Krasko, O. Glebova, A. Kirpich, P. Skums
Communications Medicine 1, 31. Journal Software
- ”Scale-free Spanning Trees and their Application in Genomic Epidemiology"
Y. Orlovich, V. Kaibel, K. Kukharenko, P. Skums
Journal of Computational Biology doi: 10.1089/cmb.2020.0500
- ”Technology dictates algorithms: Recent developments in read alignment"
M. Alser, J. Rotman, K. Taraszka, H. Shi, P. Icer Baykal, H. Taegyun Yang, V. Xue, S. Knyazev, B. D. Singer, B. Balliu, D. Koslicki, P. Skums, A. Zelikovsky, C. Alkan, O. Mutlu, S. Mangul
Genome Biology 22 (1), 1-34
- ”Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction"
S. Knyazev, V. Tsyvina, A. Shankar, A. Melnyk, A. Artyomenko, T. Malygina, Y.Porozov, E. Campbell, S. Mangul, W. Switzer, P. Skums, A. Zelikovsky
Nucleic Acids Research: gkab576