Abstract Body

To maximize the population-level impact of pre-exposure prophylaxis (PrEP), healthcare organizations need tools to identify persons at risk for HIV infection. We hypothesized that electronic health record (EHR) data could be used to identify patients (pts) at increased risk for acquiring HIV who might be candidates for PrEP.

We developed and evaluated automated algorithms to predict incident HIV infection using EHR data from a community health center in Boston specializing in health care for sexual and gender minorities. EHR data were extracted for 168 variables potentially associated with incident HIV for all pts with ≥1 clinical encounter during 2011-2016. EHR variables included patient demographics (e.g. age, gender), laboratory tests and results (e.g. tests for HIV and sexually transmitted infections), diagnosis codes (e.g. HIV counseling), coinfections (e.g. hepatitis B or C), suggestive routine care (e.g. anal cytology) and prescriptions (e.g. buprenorphine). Candidate HIV prediction algorithms were developed using machine learning methods (LASSO, ridge regression, random forest) and generalized linear models and were used to estimate risk for incident HIV for all pts. Algorithms were trained using 2011-2015 data and validated using 2016 data; pts using PrEP were excluded from analyses. We assessed algorithm performance using area under the receiver operator curves (AUC), sensitivity, specificity, and positive predictive value (PPV).

Of 33,404 pts in care during 2011-2016, 64% were male (of whom 46% identified as gay/bisexual) and 8% were transgender/gender non-conforming, and 68% were white, 8% were Black, and 6% were Latino; HIV prevalence was 9% and 5% of pts used PrEP. In total, 423 pts (1.3%) had incident HIV, including 71 of 18,275 pts in care during 2016. AUCs for candidate prediction algorithms ranged from 0.43 to 0.83; LASSO had the highest AUC. Using a cut-off of the top 20% of patient risk scores, LASSO had a sensitivity of 73%, specificity of 81% and PPV of 1.5% for predicting incident HIV in 2016. We varied this cut-off to explore trade-offs in sensitivity, PPV, and population size identified as screen-positive. (Table)

Automated algorithms that integrate EHR data have favorable properties as population-level screening tools to identify patients who merit clinical evaluations for PrEP. Despite low PPVs, these algorithms offer an efficient means of reducing missed opportunities to provide PrEP to those patients most likely to acquire HIV.