Back to the Future: What Predictive Decision Support Can Learn from DeLoreans and The Big Short
In the third blog in our series on artificial intelligence (AI) and machine learning (ML)-driven predictive models (data analytics tool or software) in health care, we discussed some potential risks (sometimes referred to as model harms) related to these emerging technologies and how these risks could lead to adverse impacts or negative outcomes. Given these potential risks, some have questioned whether they can trust the use of these technologies in health care.
We are encouraged to see that some stakeholders are demonstrating that a predictive model is fair, appropriate, valid, effective, and safe (FAVES), rather than amplifying biases or harms. Some stakeholders are indicating this through descriptions of the processes used to develop the model and minimize risks, evaluation of the model’s performance (often described in peer-reviewed literature and according to nascent reporting guidelines), and clear description of how and when the model should be used. However, too often, this information is unavailable to purchasers, implementers, and users. As a result, the information necessary to assess the quality of predictive models is unavailable, including when these models are embedded or integrated with certified health IT.
We see this lack of consistent information availability (or information asymmetry) as a defining challenge inhibiting the optimization of predictive decision support interventions (DSIs) in health care. For students of economics, this type of insufficient information or “quality uncertainty” is one of the most famous forms of market failures, often colloquially called a “market for lemons”—as in the ancient slang for malfunctioning used cars. A “market for lemons” can lead to several negative dynamics we’ll briefly discuss.
Are we seeing a market for lemons in predictive models in some areas of health care?
There are three classic dynamics we’d expect to see in “a “market for lemons,” and we are watching for signs of each in the market for predictive models in health care:
- Purchaser or User Gets a Real Lemon: Potential purchasers or model users are unsure if a model is of good quality and so, they end up using bad models or using models in ways that are not appropriate (e.g., using a model outside the environment for which it was designed or ill-suited for a given task or context). Famously, the misuse of models and under-appreciation of model risks led to over-reliance on models to estimate risks of default for mortgage-backed securities and contributed directly to the 2008 financial crisis in the United States. In the last few years, we’ve seen high profile instances in health care in which users discovered, only, belatedly, that models they used or acquired were not accurate or were biased.
- Uncertainty and Inability to Determine the “Good” from the “Bad,” Predictive Models Not Purchased: Potential users just don’t purchase anything because they’re not sure if what’s being offered is effective, snake oil, or like in the car market, a rattletrap. Clinician groups have shared that clinicians simply do not have the information they need to know if the tool will work in their practice. We’ve also seen the continued use of exceptionally simple models, developed decades ago, when more advanced alternatives, like predictive models, exist. One reason more advanced models haven’t replaced these simple models may be due to the fundamental uncertainty about the quality of more advanced alternatives—some, but not all, of which are likely more effective (high quality) than older, simple models.
- Producers or Suppliers of High Quality Models May Exit the Health Care Market: High quality model producers leave the market, especially if high quality comes at high cost. If purchasers can’t distinguish a good model from a bad model, they won’t want to pay more for an expensive, “good” model than for a crummy one. The high-quality producers end up financially unviable and exit the market, leaving only bad options behind. You may be familiar with the concept of an “adverse selection death spiral” from another part of the health care industry. Right now, we are seeing enormous entry and exit from the market for models in health care, and we are aware that a lack of information on quality may inhibit the emergence of a healthy market for models.
What can be done about a market for predictive model lemons?
The good news is that the limited information on quality that drives a market for lemons can be resolved, hopefully, leading toward a more robust market where high-quality developers are identified and rewarded.
There are two classic things that can be done about a “market for lemons”:
One option is to create quality certification, so purchasers have some trust in the underlying quality of what they’re buying, particularly in circumstances where quality is difficult for the purchaser to ascertain and the implications of low quality are potentially calamitous. For instance, medical licensing boards were created to give patients confidence that their doctor was well trained and not a snake oil salesman. Another example comes from the Food and Drug Administration (FDA), which has, of course, approved numerous medications and other therapies as safe, effective, and fit for commercial distribution. The FDA has also approved some 300 forms of AI/ML-enabled medical devices, “clearing” the use of these devices based on evidence that they are safe and effective for their intended use, and that the developer of these devices can manufacture them according to federal quality standards.
A second option is to require transparency to make it easier for potential users to ascertain the quality or appropriateness of a product. That’s perhaps the most famous response to the name-sake “market for lemons”—CARFAX’s Vehicle History reports, built on reservoirs of public data including those resulting from the Truth in Mileage Act and accident damage reports, making it a whole lot harder to overcharge for a rust bucket. Other examples of this approach include food nutrition facts labels and over-the-counter drug facts labels, which provide information about what is in a product, what it is supposed to do, who should (and in some instances, shouldn’t) use it, and how it should be used.
For predictive DSIs in health care, information like a model “facts label” can be generated by applying various tests to the model in a process called model validation. However, the experience of the financial services industry highlights that validation information may not be sufficient to ensure models are high quality and used appropriately on its own. In 2000, the U.S. Department of Treasury’s Office of the Comptroller of the Currency (OCC) initially issued Model Validation Guidance to outline principles for validating models. Over the succeeding decade, spurred by the role of models in the 2008 financial crisis in the U.S., the focus of the OCC’s guidance broadened to encompass risk management and governance reflecting not only model validation but also organizational competencies and practices. In particular, the OCC and U.S. Board of Governors of the Federal Reserve System’s joint guidance, called SR 11-7, notes that key aspects of an effective model risk management framework include “robust model development, implementation, and use; effective validation; and sound governance, policies, and controls.”
To help organizations manage both enterprise and societal risks related to the design, development, deployment, evaluation, and use of AI systems, the National Institutes of Standards and Technology recently published a second draft of a framework for AI risk management (AI RMF). The framework aims to “provide a flexible, structured, and measurable process to address AI risks prospectively and continuously throughout the AI lifecycle.”
Both the SR 11-7 guidance and AI RMF also draw attention to the importance of organizational competencies in managing risk for models and AI/ML-related technologies. Such guidance and framework can help cultivate trust in both the products for which risk is being managed and the organizations and their practices used to manage such risks.
While we are aware of numerous existing and emerging efforts to establish guidelines, frameworks, and principles to encourage optimization of predictive models in health care, including recent industry recognition of the need for evaluation, monitoring, and guardrails, we also know that commonly used ML models developed by health IT developers or users frequently do not adhere to such guidelines.
In our next and final blog of the series, we’ll discuss some potential directions ONC could go in to help improve information asymmetry in this area, enable users to determine whether predictive DSIs are FAVES, and steps we can all take to optimize the use of algorithms in health care.