Big data in work comp – no panacea

There’s so much excitement about big data and the potential for it in work comp that it would be all to easy to forget some basics.

The law of small numbers chief among them.

Work comp accounts for a tad more than one percent of US medical spend – $30 billion in comp medical vs $2.8 trillion in total medical spend in 2012.

Many docs treat just a couple of work comp claims a year, and those who do handle a lot of WC claims see a wide range of injuries: knees, ankles, backs, shoulders, hands damaged by cuts, sprains strains and severe trauma.  When looking to compare providers – or procedures for that matter – researchers need enough data points to develop a statistically-valid sample set.  In most cases, no single provider has enough claims to enable clear-cut evaluation.  And, if they do, there aren’t any other providers in their service area with the necessary volume, making comparisons nigh-on-impossible.

The issue is statistical validity and statistical accuracy. Simply put, is the measurement procedure capable of measuring what it is supposed to measure. Without enough data, there just isn’t enough information to accurately assess performance.

That’s not to say researchers can’t do very meaningful and helpful analyses; the one just published on opioid prescribing by physicians dispensing docs to work comp claimants is a perfect example; the ongoing research by CWCI, WCRI, and NCCI provide plenty of additional examples.

The problem occurs when consultants, payers or managed care firms try to make definitive statements about individual providers based on inadequate data.  In my experience, provider rankings are often – but not always – based on little more than reimbursement or “savings” figures, and in no way account for “quality” measured by return to work, disability duration, cost-per-claim.  There isn’t enough data to case-mix adjust, not enough data to make comparisons, or really “rate” docs.

I would note that some payers, most often state funds, and some managed care firms, notably MedRisk (HSA consulting client) have a wealth of data and can (and do) make valid comparisons.

What does this mean for you?

Beware of rankings, ratings, and comparisons of individual providers.  Unless the underlying data is robust.

7 thoughts on “Big data in work comp – no panacea

  1. You are so right! Medical providers are primarily a cottage industry in this country so a critical mass of data is not available. Another problem is medical provider data in MOST data sets does not include unique identifiers such as state license numbers or NPI’s. Therefore, individuals cannot be separated from groups. This also makes it difficult to accurately combine providers and analyze their performance across data sets.

  2. Joe – thanks for another insightful post; as always, you have delved below the surface of an issue to uncover what works and what does not. Gaining credible insight from provider-level results for WC using big data is almost impossible if you are not a Top 3 WC insurer or an industry rating/research body. That being said, big data has lots of other uses in WC where there is enough data for predictive models to be sufficiently reliable and game-changing at the claim level, while taking advantage of medical transaction data and claim notes (e.g., creeping high-cost claims). Operationally, we might also consider how the pricing of newer healthcare delivery models can potentially be applied to WC using big data reporting analytics as evidence-based support.

  3. Joe,
    We’d agree.
    However, unlike the term “statistically valid,” for most of us, the term “robust” can be very subjective. I am sure you have found, as we have, that “robust” can often be defined by the user in such a way as to “validate” whatever conclusion and decision the user may already want or need to make. In the network business, there are people whose job it is to make decisions about which providers are in and which are out. If they can’t do so, perhaps they are unneeded, so they are pressured to make those decisions anyway, based on whatever information can be ginned up…regardless of how “robust” or “valid” that information may actually be. I believe the term is “research by conclusion.”
    We also completely agree with your take-away message. Caveat emptor. To beware of comparisons unless the data is “robust,” leaves the network services buyer subject to whatever data the sales or account management person has available and may lead to poor, but “justifiable,” decisions. The overlap of contracted providers between major networks is usually very large. Therefore, any demonstrative difference between one network and another is rare. Buyers must look for the best data available. However, as you point out, the data is not good and thus the choice to buy or retain is most often driven not by health care quality or outcomes, but by price and revenue potential for the claims administrator. That’s unfortunate for the two groups who are most at risk…the injured worker and the employer.
    I am not a statistician by training, but it would be helpful for those making decisions about the relative quality, effectiveness and efficiency of health care providers, to establish a common definition of “robust.” I found the following reference, if anyone is willing to decipher it: “Robust Statistics,” copyright 1992–2004 B. D. Ripley (http://www.stats.ox.ac.uk/pub/StatMeth/Robust.pdf).

  4. The lack of a sufficient work comp dataset is a great point, but not the only one that causes many in the industry to question today’s “outcome based” networks. Ultimately, these models purport to rank physicians based on outcome, and we all have a pretty good idea what the standard inputs are. The question is are these the right inputs? Are they fair? Are they meaningful? What do they tell us?

    For example, they say that the biggest single indicator of successful return to work is overall job satisfaction. If this is true, or even partly true, how is this factored in the model? How about other patient/claimant indicators – co-morbidities, age, motivation, how pain is manifested, how are these factored?

    Consider same injury, same patient demographics, same physician, and even the same employer – one injured work is crippled by the pain and can’t perform the functions of the job, the other can “work through the pain” and perform the full functions of the job. Both injured workers are legitimate and honest, but the one is off work, the other is back at work. Same physician – is this doc “on” one day and off the next? Or maybe some of this is beyond the physician’s control? What about if the one of the injured workers is motivated by secondary gains and the other isn’t? Will this impact the outcome?

    What about the employer? Some employers have very liberal return to work policies and can take anyone back, “even in a stretcher.” Others have less liberal policies, and others have a “no RTW unless 100%” policy. How are these considered?

    What about the unions, or lackthereof, and labor/ management relations? Does this interplay have an impact on outcomes and RTW?

    Of course they do! They all do! For these models to pretend that the physician is the primary driver of RTW and outcome is simply not fair to the physicians.

    Consider the physician who has an office located close to a very large union shop with an aging workforce facilities, and equipment, no RTW program, and miserable labor/mgt relations. Then consider her colleage across town located near a new corporate campus with a young workforce, non-union, on-site fitness center, massage, dry cleaning, etc.

    Anyone want to guess which of the two physicians will have better “outcomes” as calculated by these models? I struggle to understand how the traditional inputs are fair or even meaningful in helping me determine which of the two physicians is “better.”

  5. Joe: I also agree with your comments. We considered your points as well when we started putting together our business analytics. We have always cautioned our clients to recognize sample size, but also recognize that we have to start somewhere when evaluating the effectiveness of a program and we must recognize that raw numbers are the beginning of that analysis, not the end.

    Also agree with Karen and our data set does allow for unique identifiers.

  6. There is another factor that influences this significantly.
    Although research has done wonders for data analysis in that past 20 years, we have become over-enamored of and reliant on spreadsheets and numbers, thinking that if we can break down something to a small enough element, we will have our smoking gun.
    Where research and data analysis has limits are in capturing those “difficult to quantify variables” such as administrative burdens, (i.e. provider office issues, etc.). We could no doubt devise some data point to capture but the collection process can be onerous and the meaning vague (i.e. “satisfaction”)
    To your point, our industry (WC) is small enough that we must put common sense limits on reading research, understand what goes behind it, and.acting appropriately on that realization. I believe you’ve referred to that before as having “vision” versus being a manager.
    Good article Joe.