Research

Our research

We develop algorithms and mathematical models to address a variety of problems in biology, epidemiology and public health. Major areas of interest include computational virology and molecular epidemiology, next-generation sequencing data processing, discrete mathematics and graph theory

Modern biology, medicine and public health are unimaginable without molecular technologies. The synthesis of molecular biology and medicine has provided humanity with novel cancer diagnostic and treatment methods, vaccine development strategies, infectious disease therapeutics, knowledge about structures and evolutionary dynamics of viruses and the microbiome, ushering in a new era of technologically advanced data-driven precision medicine. The most important technological breakthrough that allowed for development of such precision-oriented approaches to generate therapeutics was the advent of Next-Generation Sequencing (NGS) technologies. NGS is constantly improving and the sequencing costs are rapidly decreasing with a speed currently surpassing famous Moore’s law, which describes the growth of computing power of modern CPUs. As NGS becomes more accessible for biomedical and clinical researchers, the challenges associated with handling of the generated data arise. NGS generates enormous amounts of data that requires development of advanced computational methods for its processing, integration and analysis, as well as novel mathematical models for the interpretation of the new facts discovered using NGS.

Our collaborators

Centers for Disease Control and Prevention

Georgia Institute of Technology

University of Connecticut

San Francisco State University

University of Southern California

Texas A&T University

University of Haifa

Epidemiology and evolution of highly mutable viruses

Highly mutable RNA viruses, such as human immunodeficiency virus (HIV) and hepatitis C virus (HCV) are major causes of morbidity and mortality in the world. The hallmark of RNA viruses is their extremely high genetic diversity that allows them to rapidly establish new infections, escape host's immune system and develop drug resistance. Emergence of next-generation sequencing technologies (NGS) promises to revolutionize the fields of virology and epidemiology by allowing to sample and characterize millions of intra-host viral variants in thousands of infected individuals.

Using analysis of molecular data and mathematical modelling, we are trying to understand how viruses escape the host's immune system, acquire drug resistance, and spread through a population of susceptible individuals. We are especially interested in studying the roles of complex networks in viral evolution, including social networks, genetic networks, and cross-immunoreactivity networks.

Cencer genomics

Cancer is a disease driven by the uncontrolled growth of cancer cells having series of somatic mutations acquired during the tumor evolution. Cancer clones form heterogeneous populations, which include multiple subpopulations constantly evolving to compete for resources, metastasize, escape immune system and therapy. Recent advances in sequencing technologies promise to have a profound effect on oncological research. Recently, the most promising technological breakthrough was the advent of single cell sequencing, which allows to access cancer clone populations at the finest possible resolution.

Using analysis of single cell sequencing data and mathematical modelling, we are trying to understand the rules guiding the evolution of cancer cells. We develop algorithms and computational models to infer tumor evolutionary history, quantify clonal selection and reconstruct cancer fitness landscape.

Computational Genomics and Next-Generation Sequencing

We develop algorithms for inference of rare genomic variants and reconstruction of structures of heterogeneous populations from next-generation sequencing data. Examples of such populations are viral quasispecies, immune repertoires and cancer cells.

Modern NGS platforms produce hundreds of millions of reads which allow full and even double coverage of highly variable genome regions. This high coverage is essential for capturing rare variants. However, heterogeneous populations haplotyping is complicated due to the extremely large number of reads produced by NGS, the need to assemble unknown number of closely related variants and to identify and preserve low-frequency variants. The latter is especially challenging, because NGS technologies are error-prone, and therefore it is required to distinguish between real and artificial genetic heterogeneity produced by sequencing errors.

Discrete Mathematics and Graph Theory

Our research topics are graph decompositions and representations of graphs as derived objects of other discrete systems. Such representations allow to build bridges between various scientific disciplines, such as computer science, evolutionary biology, combinatorics, topology, and algebra.