Background: PANGEA_HIV (Phylogenetics and Networks for Generalised HIV Epidemics in Africa) will generate a large volume of next generation viral sequence data from generalized HIV epidemics in sub-Saharan Africa in order to better characterize these epidemics and evaluate HIV prevention efforts. However, the accuracy and reliability of phylogenetic tools to measure aspects of transmission dynamics in these settings is not known.
Methods: The PANGEA_HIV methodology working group conducted a methods comparison exercise in collaboration with multiple, independent research groups to identify the accuracy and power of phylogenetic methods in estimating recent changes in HIV incidence. Two, highly detailed, agent-based epidemiological models capturing generalized HIV transmission dynamics in a village-like, and regional population were developed. Simulated subtype C phylogenies were generated from the transmission tree output which was selected to represent populations varying in HIV incidence dynamics, population size, sampling fraction and model assumptions. Sample datasets of up to several hundred sequences of gag, pol and env for each individual sampled have been generated. These will be coded before distribution to participating collaborators.
Results: First analyses of simple simulated datasets have been performed on pol sequences using a recently developed automated tool (“CPT”) which identifies sequence clusters at a maximum genetic distance of 4.5% and bootstrap support of 90%. The samples analysed came from the village model sampled in growth phase (~25 yr post introduction, 4% incidence) and decline (3 yrs after introduction of ART, 2% incidence), with a 20% sampling density. The CPT detected a highly significant decrease in mean cluster size (from 4.13 to 2.76, p = 0.002) and an increase in normalised cluster maximum genetic distance (0.0076 to 0.011, p < 1 x 10e-4), along with a highly significant increase overall in branch lengths (Fig 1).
Conclusions: We have generated simulated data sets of viral sequences corresponding to samples from hypothetical, generalized HIV-1 epidemic scenarios in sub-Saharan Africa. Initial results show the power of phylogenetic tools to detect changes in incidence and prevalence in the context of generalized HIV epidemics. Further development will focus on using the simulations to test the sample density required by different methodologies to reveal underlying changes in epidemic dynamics.
Figure 1 – The mean branch lengths of clusters in a simulated growing epidemic (blue) and simulated shrinking epidemic (pink). The clusters in the shrinking epidemic had a highly significant increase in overall branch length (p < 2x10e-16).