Back to projects

Addressing Algorithmic Challenges in Computational Genomic Epidemiology

    Recent advances in sequencing technologies had a profound effect on viral research by generating enormous amounts of genomic data. An interdisciplinary area of research that uses the analysis of viral genomes to understand how viruses evolve and spread is called genomic epidemiology . It is a young area which computational toolkit is still in development. This process faces many algorithmic challenges, and there are gaps between the analytic capabilities of currently available tools and the demand from biomedical and epidemiological applications. The major challenges include (i) the extraction of weak genomic signal from noisy and fragmented sequencing data; (ii) the need to scale to the levels of ``big data" produced by modern next-generation sequencing platforms; (iii) the algorithmic hardness of fitting complex epidemiological and evolutionary models to the observed data.

    The overarching goal of this proposal is to address these challenges} with respect to 3 fundamental problems of computational genomic epidemiology: assessment of viral genetic diversity, reconstruction of virus transmission history and quantification of viral phenotypic diversity. Specific goals include:

  • Development of methods for reconstruction of the whole spectrum of viral genetic diversity, including closely related low-frequency variants, from noisy and fragmented next-generation sequencing data.
  • Development of algorithms for reconstruction of viral transmission networks (``who infected whom") using genomic data and expected properties of social networks of contacts between susceptible individuals.
  • Development of a model-based algorithmic framework for the inference of viral fitness landscapes.


  • ”SOPHIE: viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework"
    P. Skums, F. Mohebbi, V. Tsyvina, P. Icer, Y. Khudyakov.
    RECOMB 2022 (accepted)

  • ”From Alpha to Zeta: Identifying variants and subtypes of SARS-CoV-2 via clustering"
    A. Melnyk, F. Mohebbi, S. Knyazev, B. Sahoo, R. Hosseini, P. Skums, A. Zelikovsky, M. Patterson
    Journal of Computational Biology 11 (2021): 1113-1129.

  • ”Scalable reconstruction of SARS-CoV-2 phylogeny with recurrent mutations"
    D. Novikov, S. Knyazev, M. Grinshpon, P. Icer, P. Skums, A Zelikovsky
    Journal of Computational Biology 28(11), 1130-1141

  • ”Investigating the first wave of the COVID-19 pandemic in Ukraine using epidemiological and genomic sequencing data"
    Y. Gankin, V. Koniukhovskii, A. Nemira, G. Chowell, T. A. Weppelmann, P. Skums, A. Kirpich
    Infection, Genetics and Evolution, 95:105087

  • ”SARS-CoV-2 transmission dynamics in Belarus in 2020 revealed by genomic and incidence data analysis"
    A. Nemira, A. E. Adeniyi, E. L. Gasich, K. Y.Bulda, L. N. Valentovich, A. G. Krasko, O. Glebova, A. Kirpich, P. Skums
    Communications Medicine 1, 31. Journal Software

  • ”Scale-free Spanning Trees and their Application in Genomic Epidemiology"
    Y. Orlovich, V. Kaibel, K. Kukharenko, P. Skums
    Journal of Computational Biology doi: 10.1089/cmb.2020.0500

  • ”Technology dictates algorithms: Recent developments in read alignment"
    M. Alser, J. Rotman, K. Taraszka, H. Shi, P. Icer Baykal, H. Taegyun Yang, V. Xue, S. Knyazev, B. D. Singer, B. Balliu, D. Koslicki, P. Skums, A. Zelikovsky, C. Alkan, O. Mutlu, S. Mangul
    Genome Biology 22 (1), 1-34

  • ”Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction"
    S. Knyazev, V. Tsyvina, A. Shankar, A. Melnyk, A. Artyomenko, T. Malygina, Y.Porozov, E. Campbell, S. Mangul, W. Switzer, P. Skums, A. Zelikovsky
    Nucleic Acids Research: gkab576
    Journal Software