Back to projects

Algorithms for prediction of viral infection stage using NGS data

  • Download: TBA
  • References: TBA

Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there are no diagnostic assays for distinguishing recent and chronic HCV infections. HCV is highly mutable. Each infected person hosts a heterogeneous population of genetically related HCV variants. Owing to complexity of structural development of intra-host populations affected by bouts of selective sweeps and negative selection during chronic infection, simple metrics of genetic heterogeneity are not sufficiently accurate for staging HCV infections.

Using intra-host HCV populations sampled by next-generation sequencing of a highly heterogeneous genomic region (HVR1) from recently and chronically infected individuals, we are developing a prediction model for differentiating recent and chronic infections. Analysis of 245,878 viral sequences was conducted using 12 parameters for evaluation of various characteristics of HCV populations, including diversity, topological structure, strength of selection and epistasis. In particular, diversity was measured using entropy of the k-mer distribution for each population, as well as by mean, standard deviation and coefficient of variance of pairwise distances among HCV variants. Metrics of selection and epistasis were derived from comparison of sampled and randomized populations generated to reduce effects of epistasis and selection while preserving allele structure. Correlation coefficient between frequency and eigenvector centrality of variants in the observed sequence space was calculated to estimate effect of selection on topological structure of population. A dynamic evolutionary model was applied to simulate variant frequencies for estimation of infection duration by comparing to the observed frequencies.