Poor education of algorithms is a critical issue; when artificial intelligence mirrors the unconscious thoughts, racism, and biases of the humans that generated these algorithms, it can lead to serious harm. Computer programs, for example, incorrectly reported that black defendants were twice as likely to reoffend as someone white. When an AI used cost as an indicator of healthcare needs, it falsely labeled black patients as healthier than equally sick whites because less money was being spent on them. Even the AI used to write a play based on using harmful cast stereotypes.
Removing sensitive features from data seems like a viable fit. But what happens when that’s not enough?
Examples of bias in natural language processing are countless, but MIT scientists have investigated another important and largely underexplored modality: medical images. Using private and public datasets, the team found that AI can accurately predict patients’ self-reported race from medical images alone. Using imaging data from chest x-rays, limb x-rays, chest CT scans and mammograms, the team trained a deep learning model to identify race as white, black or Asian – even if the images themselves contained no explicit mention of the patient’s race. It’s a feat even the most experienced doctors can’t do, and it’s unclear how the model was able to do it.
In an attempt to untangle and make sense of the enigmatic “how” of it all, researchers conducted a host of experiments. To investigate possible mechanisms of race detection, they looked at variables such as differences in anatomy, bone density, image resolution – and many more, and the patterns still prevailed with great ability to detect. race from chest X-rays. “These results were initially puzzling, as our research team members could not come close to identifying a good indicator for this task,” says paper co-author Marzyeh Ghassemi, assistant professor in the department. in Electrical Engineering and Computer Science from MIT. and the Institute for Medical Engineering and Science (IMES), which is affiliated with the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Jameel Clinic. “Even when you filter the medical images beyond where they are recognizable as medical images, the deep models retain very high performance. This is concerning because superhuman abilities are generally much harder to control, regulate, and prevent from harming people.
In a clinical setting, algorithms can help us know if a patient is a candidate for chemotherapy, dictate patient triage, or decide if a trip to the ICU is needed. “We think the algorithms only look at vital signs or lab tests, but it’s possible they’ll also look at your race, ethnicity, gender, whether you’re incarcerated or not – even if all of that information is hidden,” says co-author of the paper, Leo Anthony Celi, principal investigator at IMES at MIT and associate professor of medicine at Harvard Medical School. “Just because you have representation of different groups in your algorithms doesn’t mean they won’t perpetuate or amplify existing disparities and inequalities. Feeding algorithms more data with representation is not a panacea. This document should make us pause and really reconsider if we are ready to bring AI to the bedside.
The study, “AI Recognition of Patient Race in Medical Imaging: A Modeling Study,” was published in Lancet Digital Health May 11. Celi and Ghassemi wrote the article alongside 20 other authors in four countries.
To set up the tests, the scientists first showed that the models were able to predict race across multiple imaging modalities, diverse datasets, and clinical tasks, as well as a range of academic centers and populations. of patients in the United States. They used three large chest X-ray datasets and tested the model on an unseen subset of the dataset used to train the model and a completely different one. Next, they trained the racial identity detection models for non-chest X-ray images from multiple body locations, including digital X-ray, mammography, lateral cervical spine X-rays, and chest CT scans to see if the Model performance was limited to chest radiographs.
The team covered many bases in an attempt to explain the behavior of the model: differences in physical characteristics between different racial groups (body habitus, breast density), disease distribution (previous studies have shown that black patients have a higher incidence of health problems such as heart disease). ), location- or tissue-specific differences, the effects of societal bias and environmental stress, the ability of deep learning systems to detect race when multiple demographic and patient-related factors were combined, and whether specific image regions contributed to breed recognition.
What emerged was truly astounding: the ability of the models to predict race from diagnostic labels alone was much lower than that of models based on chest x-ray image.
For example, the bone density test used images where the thickest part of the bone appeared white and the thinnest part appeared more gray or translucent. The scientists speculated that since black people generally had higher bone mineral density, color differences helped AI models detect race. To cut this, they cropped the images with a filter, so the model couldn’t color the differences. It turned out that cutting off the color supply didn’t affect the model – it could still accurately predict runs. (The value for “area under the curve”, i.e. the measure of accuracy of a quantitative diagnostic test, was 0.94 to 0.96). As such, the learned features of the model seemed to rely on all regions of the image, which means that controlling this kind of algorithmic behavior presents a complicated and difficult problem.
Scientists recognize the limited availability of racial identity labels, which led them to focus on Asian, black and white populations, and that their ground truth was a self-reported detail. Further work to come will potentially include isolating different signals before image reconstruction, since, as with bone density experiments, they could not account for residual bone tissue that was in the images.
Notably, other work by Ghassemi and Celi led by MIT student Hammaad Adam found that models can also identify patients’ self-reported race from clinical notes, even when those notes lack explicit indicators of race. Just like in this work, human experts are not able to accurately predict patient race from the same redacted clinical notes.
“We need to bring in the social scientists. Domain experts, who are usually clinicians, public health practitioners, computer scientists and engineers, are not enough. Health care is as much a socio-cultural issue as it is a medical one. We need another group of experts to weigh in and provide input and feedback on how we design, develop, deploy and evaluate these algorithms,” Celi says. “We must also ask data scientists, before any data exploration, are there any disparities? Which patient groups are marginalized? What are the drivers of these disparities? Is it access to care? Is it the subjectivity of caregivers? If we don’t understand this, we won’t have a chance of being able to identify unintended consequences of algorithms, and there’s no way to protect algorithms from perpetuating bias.
“The fact that algorithms “see” race, as the authors convincingly document, can be dangerous. But an important and related fact is that, when used with care, algorithms can also work to counteract bias,” says Ziad Obermeyer, an associate professor at the University of California, Berkeley, whose research focuses on the AI applied to health. “In our own work, led by computer scientist Emma Pierson of Cornell, we show that algorithms that learn from patients’ pain experiences can find new sources of knee pain in X-rays that disproportionately affect black patients. – and are disproportionately missed by radiologists. So, like any tool, algorithms can be a force for evil or a force for good – which is up to us and the choices we make when building algorithms.
The work is supported, in part, by the National Institutes of Health.