By Diana S. M. Buist, PhD, MPH, Kaiser Permanente Washington Health Research Institute Senior Investigator and Director of Research and Strategic Partnerships
It’s only March and already 2020 is a big year for artificial intelligence (AI) tools for breast cancer screening. I’m excited to announce our publication in JAMA Network Open with results from a challenge with a $1 million purse. The international crowdsourced contest was to generate open-source code for the scientific, industrial, and regulatory communities to use to improve breast cancer detection.
DREAM Challenges are organized by scientists from academia, industry, government, and nonprofit organizations. The 2017 International Digital Mammography DREAM Challenge (which stands for “Dialogue for Reverse Engineering Assessment and Methods”), funded by the Laura and John Arnold Foundation, was organized by Sage Bionetworks, IBM Research, Icahn School of Medicine at Mount Sinai, and Kaiser Permanente Washington Health Research Institute.
We invited teams from across the globe to develop open-source machine-learning algorithms to improve mammography interpretation. Current U.S. practice is that radiologists look at mammograms to identify lesions, or unusual features, indicating potential cancer. The challenge asked AI developers to create algorithms that do the same: Find patterns in images that are associated with cancer. The algorithms were trained using many mammograms with and without cancerous lesions, with the goal of improving the predictive accuracy of digital mammography for detecting breast cancer early.
The reason for this goal? Of the 40 million annual U.S. screening mammograms, 9-10% result in patients being asked to return for more diagnostic testing. Only about 5% of these recalls result in a breast cancer diagnosis. The false positives lead to unnecessary tests or even cancer treatment and contribute to the annual $10 billion in U.S. mammography costs.
Kaiser Permanente Washington supported the crowdsourced DREAM Challenge by providing more than 144,000 anonymized screening mammogram images from more than 80,000 people, with 952 breast cancers. Images were linked to data on risk factors from the Breast Cancer Surveillance Consortium, a national network of breast imaging registries.
DREAM Challenge competitors included 1,100 computer scientists, radiologists, students, and others in 126 teams from 44 countries. The challenge used an innovative approach that limited the release of de-identified digital images. Competitors were provided only a small number of images to develop their AI algorithms. They submitted refined algorithms to test on a larger dataset that was kept secure. The top 8 performing teams were invited to collaborate on a combined AI model that improved on individual model performance.
The international challengers did not create an AI algorithm that could meet or beat radiologists’ performance. However, combining an AI algorithm with radiologist assessment might improve accuracy. Our results suggest supplementing single-radiologist reading with AI could prevent half a million Americans from having unnecessary follow-up tests after an annual screening mammogram. It’s important to state that we did not evaluate how AI could be added to care or if adding it would actually lead to better accuracy.
The details are in our paper and appendix, but here’s what we found:
However, lessons from mammography with computer-assisted detection tell us that we need more research on implementing and evaluating a combined approach of AI and radiologist assessment.
We learned a lot in this project about developing AI for real-world health care.
In January, Google scientists published that their AI system improved breast screening accuracy over U.S. and U.K. radiologists. This field is moving fast so issues that all of us working on AI for cancer screening must consider now are:
Preventing overdiagnosis. Any screening carries risks of false positives leading to unnecessary tests and treatment. Introduction of new breast screening tools must ask: Are they reducing this problem?
Ensuring generalizability. AI must be accurate for all patients. We must ask: Do the algorithms we develop account for screening history, genetics, physiology such as breast density, and demographics?
Keeping up with AI. AI tools need constant refinement. Think about how Tesla continually updates software and knowledge, for example, on avoiding accidents. Health care AI also needs to continually learn and improve. How will U.S. Food and Drug Administration-approved health AI tools keep up with algorithm iterations?
Our study shows that stories about AI replacing health professionals are exaggerated. We found that AI enhanced a radiologist’s experience and could make U.S. mammography safer for women and more efficient for delivery systems.
A nonprofit, open-science organization offers $1.2 million for advances that increase accuracy.
Kaiser Permanente Share profiles longtime KPWHRI investigator and director of research and strategic partnerships.
Read it in News and Events.
Nice warning light,’ Dr. Greg Simon says, ‘but my hands stay on the wheel,’ balancing models with judgment.