Friday, September 7, 2012

We can't predict HAI with ICD-9 codes and it's only going to get worse

I'm getting ready for a chat with a reporter concerned with issues surrounding HAI surveillance. During my preparation, I thought again through the issues of code-based algorithms (e.g. ICD-9) and I've come to the conclusion that they are useless for assessing the burden of HAIs and HAI trends and it's only going to get worse.

One area we (and many others) have looked at is the utility of ICD-9 code-based algorithms (ie administrative codes) for detecting HAIs efficiently. A key metric frequently reported by researchers is the sensitivity of a specific code or code algorithm, which is great if the purpose of the algorithm is to improve the efficiency of detection by manual methods. Thus, if the sensitivity is high-enough, you could use the code-based algorithm to reduce the number of charts that require an IP's review. If you are using codes in this way, great!  I have no problems with that.

However, many are now using code-based algorithms to track trends in specific HAIs and measure the burden of disease. My general feeling on these is that they should be completely avoided for several reasons:

1) No matter how sensitive the algorithm is, all we care about here (since we are not validating with manual review) is the positive predictive value (i.e. the proportion of all code-positive patients that actually have the HAI of interest)

2) The PPV is very low for almost all HAI algorithms

3) If we are doing our job and lowering the incidence of HAI per admission in our hospitals the PPV by definition will only get worse (given a fixed sensitivity and specificity)

To show you why I have these concerns I have constructed two 2x2 tables evaluating an excellent hypothetical code-based algorithm for UTI with a sensitivity and specificity set at 95%.  In this first 2x2, I have evaluated the performance of the algorithm when the HAI has a 5% incidence per admission (i.e., 5% of the admissions had a UTI). You can see that such a great algorithm with a high-prevalance of disease, has a poor PPV of 50% - like flipping a coin.

Now, assume we have done an amazing job and cut our HAI rate down to 1%.  Given the same hypothetical algorithm, our PPV is now a horrible 16%. Thus, as we get better at preventing HAIs, we get worse at detecting them using code-based algorithms. Are you comfortable saying UTIs are increasing or decreasing or are associated with a certain level of excess costs, when only 16% of the UTIs in your estimation are actual (true positive) UTIs?  Me neither.


  1. Great analysis! Do you (or the authors of the above algorithm) have any thoughts about how using ICD-10-CM might improve PPV? Clearly it would have to be an enormous amount, but I would be curious.

  2. The algorithm presented was purely hypothetical and I intentionally made it very very accurate. Currently there is no ICD-9 or other algorithm that is as accurate as my hypothetical one. I can't imagine any coding algorithm from ICD-10 or other system that would achieve 95% sensitivity and 95% specificity, so I can't image billing codes ever being used to accurately track/predict hospital-acquired infections or any other rare condition.