In a recent study posted to the bioRxiv* preprint server, researchers developed and validated an approach for the joint inference of measurement noise and genetic drift by analyzing time-series data of lineage frequencies.
Random genetic drift in infectious disease outbreak dynamics at the population-level results from the randomness of transmission between hosts and of host death or recovery. Studies have reported a strong genetic drift in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences resulting from superspreading events, predicted to considerably affect the viral evolution and coronavirus disease 2019 (COVID-19) epidemiology. Noise resulting from the measurement process, including bias in obtaining data across location and time, could confound genetic drift estimates.
About the study
In the present study, researchers developed an approach to jointly infer the power of measurement noise and genetic drift from time-varying lineage frequency data that enabled measurement noise to be overdispersed (instead of maintaining uniformity) and the power of overdispersion to vary with time ( instead of being constant). They also validated the accuracy of the approach via simulations.
HMM (hidden Markov modeling) was used with continuously occurring observed states and hidden ones representing observed and true frequencies, respectively. The transition possibility between hidden states was set by genomic drift, where the average true frequency was based on true frequencies determined in the previous period. For rare frequencies, the variance correlated with the average values based on the effective population size [Ne
The emission possibility between the observed and hidden states was based on measurement noise such that the average value of frequencies observed was equal to the true frequencies. In the case of rare frequencies, the value of variance in observed frequencies correlated with the average value denoting the time-dependent deviations from uniform-type sampling. Modeling was performed assuming that the number of persons and lineage frequencies were high enough to apply the theorem of the central limit.
The model generated “superlineages” by grouping lineages based on phylogenetic distances so that the total value of the lineages’ abundance and frequency exceeded the threshold value, yielding 486, 4083, 6,225, and 24,867 strains of SARS-CoV-2’s pre-B. 1.177, B.1.177, Alpha, and Delta variants, respectively. The team assumed that the Ne
Subsequently, the parameters that most likely represent the dataset were determined. The model was validated by performing simulations using time-varying Ne
The inferred Ne
The power of the genetic drift was consistently higher than that estimated from the observed count of SARS-CoV-2-positive persons in England by one to three orders of magnitude, throughout time, even after correcting for measurement noise. The elevated genetic drift could not be explained based on superspreading but may be partially explained by deme community structures in the contact networks of hosts. The discrepancy could not be explained by corrections accounting for epidemiological dynamics (SIR or SEIR modeling).
Sampling SARS-CoV-2-infected persons from England’s population were largely uniform for the dataset. The team found proof of a spatial arrangement in the dynamics of the B.1.177 variant, Alpha variant, and Delta variant transmission. The estimated Ne
The HMM-inferred Ne
Overall, the study findings showed that the strength of genetic drift in SARS-CoV-2 transmission in England was greater than estimated and indicated that further modeling studies methods are required to better understand the mechanisms behind the high genetic drift levels for SARS-CoV- 2 in England.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.