Abstract Body

HIV preexposure prophylaxis (PrEP) prevents HIV acquisition but uptake has been limited. Electronic health record (EHR) data may help identify patients who are at high risk of HIV acquisition and could benefit from PrEP.


We developed and validated a prediction model to identify potential PrEP candidates in a cohort of members of Kaiser Permanente Northern California not diagnosed with HIV and having ≥2 years of enrollment and ≥1 outpatient visit during 2007-2017. Using EHR data on 68 demographic, clinical, and behavioral variables potentially predictive of HIV risk, we applied logistic regression and machine learning methods to predict incident HIV cases in a derivation dataset of patients entering the cohort in 2007-2014. We assessed performance of candidate models by cross-validated area under the curve (AUC, range 0-1). We evaluated how the best-performing model might perform prospectively by validating it among members entering the cohort in 2015-2017, and compared this full model with simpler models using only traditional risk factor variables (i.e., men who have sex with men [MSM] and sexually transmitted infections [STIs]).


Of 3,751,740 eligible patients in 2007-2017, there were 1422 incident HIV cases. The best-performing model for predicting incident HIV was least absolute shrinkage and selection operator (Lasso), with an AUC of 0.90 in 2007-2014. The final model included 41 predictors, such as Black race, home ZIP code, urine positivity for methadone, and use of medications for erectile dysfunction. The full model performed well when validated prospectively using 2015-2017 data (AUC 0.89). Model performance remained high when excluding the MSM variable (AUC 0.87) or STI variables (AUC 0.90), but was reduced when including only MSM (AUC 0.74), STIs (AUC 0.61), or both (AUC 0.78; Figure). Patients in the top 1% of HIV risk scores included 45/68 (66%) male HIV cases but 0/13 (0%) female HIV cases among those entering the cohort in 2015-2017. Using the top 1% of risk scores to define potential PrEP candidates in 2015-2017, we identified 6076 candidates, of whom 5577 (92%) were not currently on PrEP.


Prediction models using EHR data can identify patients who are at high risk of HIV acquisition but not using PrEP, and should be tested as a strategy to improve PrEP use. Models using rich clinical data outperform models using only traditional risk factors. Additional EHR variables or other data are needed to identify females who may benefit from PrEP.