County-Level Relationships Between Risk Factors for Severe COVID-19 and Deaths

Quality Quarterly Article

More than 190,000 COVID-19 cases have been confirmed in California as of June 2020, resulting in over 5,600 deaths. Several potential risk factors for developing severe COVID-19 have been identified, including older age and various underlying medical conditions. HQI estimated the relationships between county-wide prevalence of several of these risk factors and COVID-19-related deaths/death rates in California counties using estimates of prevalence from historical statewide hospital  inpatient, emergency department, and ambulatory surgery discharge records. 

Data Preparation  

Cumulative numbers of COVID-19 deaths for each county were obtained from the CHHS Open Data Portal. Population estimates for each county in 2019 were obtained from the U.S. Census Bureau to create death rates per capita. According to the Centers for Disease Control and Prevention’s (CDC’s) list of risk factors for COVID-19, we mapped corresponding ICD-10 diagnoses codes in the discharge records to risk factors. Risk factors included in our analysis were: age (65 years and older), gender (female), chronic lung disease or moderate to severe asthma, serious heart conditions, cancer, smoking, severe obesity (body mass index [BMI] of 40 or higher), diabetes, chronic kidney disease undergoing dialysis, and liver disease. A total of 15,357,405 summarized individual discharges from 2016-2018 were coded for each risk factor. The proportion of discharges in each county with each risk factor was calculated to create county-level estimates.  

Modeling Strategy 

To estimate the relationships between county-wide prevalence of these risk factors and COVID-19-related deaths/death rates, the following models were constructed: simple linear regression, Poisson regression, Quasi-Poisson regression, negative binomial regression, and zero-inflated negative binomial regression. Because the results of simple linear regression indicated that the assumptions of linearity, constant variance, and residual normality were violated, simple linear regression was rejected as the appropriate model. Similarly, the Poisson regression model was rejected because the assumption that the mean and variance of number of COVID-19 deaths were approximately equal was violated. Therefore, only Quasi-Poisson regression, negative binomial regression, and zero-inflated negative binomial regression (because many counties had zero COVID-19 deaths) were estimated.  

We constructed and tested these three models with residual deviance goodness-of-fit tests, and their Akaike information criterion values were compared to choose the best model. County population was added as an offset in the models to predict per capita COVID-19 death rates. Negative binomial regression outperformed all others. 


In the best-fitting negative binomial regression model including all risk factors, the only risk factors that significantly (p < .05) predicted county-level COVID-19 death rates were the proportion of discharges with chronic kidney disease undergoing dialysis and the proportion with severe obesity.  

However, we also observed high correlations between some risk factors. Therefore, a subset of four risk factors was chosen based on the percent deviance explained (relative to the null deviance) from all the possible risk factor combinations and used to predict the cumulative numbers of COVID-19 deaths across counties. The risk factors selected for the model were proportion of discharges with:  

  1. Chronic kidney disease undergoing dialysis
  2. Liver diseases
  3. Severe obesity
  4. The proportion who smoked. Both smoking and kidney disease undergoing dialysis were found to be significant (p < .05) predictors of county-level numbers of COVID-19 deaths.  

Counties whose number of deaths was higher than that predicted by the model were identified.  


Our ecological study was able to provide further evidence supporting some of the identified risk factors for developing severe COVID-19: chronic kidney disease undergoing dialysis, severe obesity, and smoking. Importantly, the size of the county population was found to be the single strongest predictor of the number of COVID-19 deaths, which is why it was used as an offset in the negative binomial regression models so death rates could be analyzed.  

The county-level risk factors were estimated based on only persons who had visited a hospital during 2016-2018, and therefore may be a poor proxy for the distribution of risk factor across counties. In particular, risk factor information for people who never visited hospitals during this time period could differ from those who did, resulting in them being a poor proxy for the whole county. In addition, some cases had to be excluded because they lacked identification numbers that could be matched across data years. Hence, the proportion of risk factors in the study may be higher estimators than would be the case for the whole population. Furthermore, the number of COVID-19 deaths in each county changed over time, but our study made use of number of deaths value at a single point of time. Finally, some of the potential risk factors could not be coded from discharge data (e.g., HIV infection), and therefore were not included in the models.