Abstract
Purpose
Many breast centers are unable to provide immediate results at the time of screening mammography which results in delayed patient care. Implementing artificial intelligence (AI) could identify patients who may have breast cancer and accelerate the time to diagnostic imaging and biopsy diagnosis.
Methods
In this prospective randomized, unblinded, controlled implementation study we enrolled 1000 screening participants between March 2021 and May 2022. The experimental group used an AI system to prioritize a subset of cases for same-visit radiologist evaluation, and same-visit diagnostic workup if necessary. The control group followed the standard of care. The primary operational endpoints were time to additional imaging (TA) and time to biopsy diagnosis (TB).
Results
The final cohort included 463 experimental and 392 control participants. The one-sided Mann-Whitney U test was employed for analysis of TA and TB. In the control group, the TA was 25.6 days [95% CI 22.0–29.9] and TB was 55.9 days [95% CI 45.5–69.6]. In comparison, the experimental group's mean TA was reduced by 25% (6.4 fewer days [one-sided 95% CI > 0.3], p<0.001) and mean TB was reduced by 30% (16.8 fewer days; 95% CI > 5.1], p=0.003). The time reduction was more pronounced for AI-prioritized participants in the experimental group. All participants eventually diagnosed with breast cancer were prioritized by the AI.
Conclusions
Implementing AI prioritization can accelerate care timelines for patients requiring additional workup, while maintaining the efficiency of delayed interpretation for most participants. Reducing diagnostic delays could contribute to improved patient adherence, decreased anxiety and addressing disparities in access to timely care.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Screening mammography has been shown to decrease mortality from breast cancer in multiple long-term prospective trials [1, 2], but screening programs suffer from delays in care following abnormal mammography. These diagnostic delays can not only cause undue patient anxiety and prolong treatment delays, but also exacerbate disparities in minority racial and ethnic groups [3, 4]. For example, one study revealed that Black women were twice as likely to experience a delay in follow-up imaging exceeding 45 days compared to White women, and such delays were associated with a 1.6-fold increase in breast cancer mortality [4]. Delays have been additionally compounded by a spike in attendance following the screening slowdown during the COVID-19 pandemic.
Traditionally, most facilities in the United States lack the capacity to provide immediate screening results to their patients and instead interpret mammograms in a “batch” setting, after the patient has left. While more than 85% of cases are found to be normal and don’t need follow-up care [5], the remaining patients are asked to return for additional workup due to indeterminate mammographic findings. Previous studies [6,7,8] have demonstrated the benefits of immediate interpretation and same-visit additional diagnostic workup, including reduced patient anxiety, faster diagnosis, improved follow-up adherence, and decreased racial disparities in diagnostic delays. Despite these patient-centered benefits of same-visit workup, this practice is not widespread primarily due to the impracticality of offering same-visit interpretation to all patients to identify the small percentage who require further imaging.
To help solve challenges in breast cancer screening, healthcare systems have begun looking to Artificial intelligence (AI). However, challenges remain related to integrating AI into existing clinical workflows to optimize patient care and outcomes. Prior applications of AI in screening mammography have focused primarily on increasing accuracy in computer aided detection (CAD) or standalone AI interpretation workflows, [9,10,11,12] but prospective studies focused on implementation of AI have been scarce [13]. Two recent interventional studies [14, 15] investigated the role of AI in double-reader screening workflows, with a goal of reducing staffing needs and increasing cancer detection rates. Here, we describe a prospective implementation study of AI for triage and its impact on operational outcomes. Our goal was to determine whether real-time AI prioritization can streamline the time to follow-up imaging evaluation and biopsy diagnosis, compared to a standard-of-care workflow. Using AI in this way has the potential to achieve the benefits of immediate interpretation for the small percentage of patients who require additional diagnostic evaluation while retaining the workflow efficiency of batch screening for the remaining majority.
Materials and methods
Oversight and compliance
This prospective, randomized, unblinded, controlled implementation study was approved by our Institutional Review Board (STU00212646). The study is exempt from National Clinical Trial (NC) registration as it does not meet all 4 criteria on the clinicaltrials.gov checklist [16].
Participants
Women aged 40–89, scheduled to undergo screening mammography between March 2021 and May 2022, were invited to participate consecutively (after meeting study eligibility criteria), using a combination of email, phone and in-person informed consent. We obtained a list and demographics of patients through the electronic medical record system. We excluded pregnant women, as well as patients with a history of breast cancer, prior mastectomy or breast implants. In total, 1000 women consented to participate in the study (Fig. 1a). The demographics of the consented cohort as compared to the institutional and national screening population [17] can be found in Table 1.
Summary of study methods. a Participants. b AI prioritization. c Primary metrics TA and TB. TA is the time to additional imaging and TB is the time to biopsy diagnosis. Screening mammograms were assigned BIRADS category 1 (negative), 2 (benign) or 0 (incomplete—additional imaging needed) after radiologist review. After additional diagnostic imaging, a BIRADS category was assigned as 1 (negative), 2 (benign), 3 (probably benign), 4 (suspicious), 5 (highly suggestive of malignancy). BIRADS Breast Imaging Reporting & Data System
AI system
The investigational AI device was based on technology described previously [9] and tuned for the prioritization use case. While the underlying model produces a score between 0 and 1, an operating point (score threshold) was selected in a retrospective population from the same institution to yield a final binary priority categorization (See Online Resource 1-Supplementary Methods for specifics). This AI operating point was held constant for the study. See the Online Resource 1-Supplementary Simulations for a post hoc simulation of alternative operating points which could accommodate the broader set of trade-offs between the fraction of prioritized patients and the algorithm’s sensitivity. The cloud-based device was developed with design controls under an ISO 13485-certified quality management system. Only de-identified mammograms were transferred when invoking the device.
Protocol
Participants were randomly assigned to the control or the experimental group with a standard study software program. In the initial phase of the study (the first 100 participants), a 9:1 assignment ratio in favor of the experimental group was used to accelerate discovery of potential technical or operational issues. A review of operations at the 100-participant mark yielded no corrections to the protocol or technical integration. For the subsequent 900 women, a 1:1 assignment ratio was used.
Participants in the control group followed the standard workflow, while those assigned to the experimental group followed the AI-modified workflow (Fig. 2). In both workflows, the 4 standard mammographic views were acquired (Selenia Dimensions, Hologic). Images were interpreted by one of 13 board-certified, fellowship-trained breast radiologists. Breast Imaging and Reporting Data System [18] (BIRADS) categories were assigned to each case. Some participants obtained supplemental screening with ultrasound and magnetic resonance (MRI), given high breast density, based on their preference.
Standard and AI-modified workflow compared. Elimination of the workflow steps depicted in the gray box for patients prioritized by the AI is the primary mechanism driving the hypothesized reduction in diagnostic delays. Screening mammograms were assigned BIRADS category 1 (negative), 2 (benign) or 0 (incomplete—additional imaging needed) after radiologist review. After additional diagnostic imaging, a BIRADS category was assigned as 1 (negative), 2 (benign), 3 (probably benign), 4 (suspicious), 5 (highly suggestive of malignancy). BIRADS Breast Imaging Reporting & Data System
For all participants, the AI yielded a binary categorization (Prioritized, Not Prioritized) to identify cases with a higher risk of malignancy (Fig. 1b, Online Resource 1-Supplementary Fig. 1). The AI prioritization only influenced the workflow of the experimental group participants—in the control group, AI results were analyzed post hoc, only after study completion.
Experimental group participants who were not prioritized by the AI, those who were prioritized but discontinued the AI-modified workflow, as well as all participants in the control group, had their mammograms interpreted according to the standard of care. To decrease the potential for bias, the interpreting radiologists were unaware that these were study participants.
Experimental group participants whose cases were prioritized by the AI were offered the opportunity to remain on site while a radiologist interpreted their mammograms within 30 minutes. The radiologist who performed immediate interpretation was assigned to diagnostic imaging on that day, not screening interpretation, and was responsible for same-visit additional imaging workup if needed. To minimize the potential to influence radiologists, the AI algorithm offered no explanation as to why the case was prioritized. Participants whose images were deemed normal by the radiologist were immediately informed of their result, while those deemed to need additional imaging were offered same-visit diagnostic imaging. If a biopsy was recommended after diagnostic imaging, it was scheduled for a later date.
Operational endpoints
Primary operational endpoints were time from screening examination to additional imaging workup completed (TA), and time from screening examination to biopsy diagnosis (TB) (Fig. 1c).
Exploratory analyses included: individual TA and TB values in the subset of participants ultimately diagnosed with breast cancer, radiologist recall rates (proportion of screening examinations with a recommendation for additional imaging), cancer detection rates (number of cancers detected per 1000 women), and performance of AI prioritization on participants requiring additional imaging or tissue diagnosis.
Statistical analysis
A sample size of 1000 was selected to compare with standard breast radiology metrics and assess the implementation of the AI-modified workflow. The one-sided Mann-Whitney U test was employed for TA and TB, respectively, to test the null hypothesis that the experiment and control samples originated from the same population, against the alternative hypothesis of TA and TB being shorter in the experiment. The t-test and other tests assuming normality were not used, because time durations were not expected to be Gaussian. A p-value of less than 0.025 was considered significant, considering the Holm-Bonferroni correction for multiple hypothesis testing.
In addition, we estimated confidence intervals for TA and TB separately on experiment and control by bootstrapping with 9999 iterations [19]. Similarly, we estimated one-sided confidence intervals for the differences in TA (and TB) between the experiment and control samples, by bootstrapping for 9999 iterations.
To study a potential effect of AI prioritization on radiologists’ recall rate, we applied a two sample proportions test [20] to the recall rates of experiment and control cases.
Results
Participant cohorts and workflow
Of the 1000 participants who consented, 15% (145/1000) were excluded (CONSORT diagram, Online Resource 1-Supplementary Fig. 2) therefore the final cohort for study analysis consisted of 855 participants. 46% (392/855) of eligible participants were randomized into the control group and followed the standard-of-care workflow, while 54% (463/855) were randomized into the experimental group and followed the AI-modified workflow. Patient characteristics were similar between the two groups (Table 1).
In the experimental group, screening mammograms from 72% (332/463) of participants were not prioritized by AI and were subsequently included in the standard worklist for blinded radiologist review. The remaining 28% (131/463) were prioritized by AI. For 6% (8/131) of AI-prioritized cases, a radiologist was not available to perform an interpretation within 30 minutes; these participants obtained their final screening results by letter as usual (all had negative or benign findings). For the remaining 94% (123/131) of AI-prioritized cases, a radiologist was available to perform interpretation within 30 minutes. This immediate radiologist interpretation yielded normal or benign results in 73% (90/123) of instances. These results were communicated to participants before they left the clinic. For the remaining 27% (33/123) of AI-prioritized participants, the immediate radiologist interpretation yielded a recommendation for additional imaging, which was communicated to participants during the same visit. In 15% (5/33) of cases, same-visit additional imaging was not offered for operational reasons (e.g., unavailable technologist); these participants obtained additional imaging at a later date. Same-visit additional imaging was offered to 85% (28/33) of the participants with indeterminate findings, 14% (4/28) of whom declined the offer and obtained additional imaging at a later date, while 86% (24/28) accepted and obtained same-visit additional imaging which was completed within 2 hours of their initial screening examination.
Diagnostic outcomes after screening
Diagnostic outcomes after mammography screening, additional imaging workup, and pathology analysis were collected until the study cut-off date of September 6, 2022, allowing for at least 3 months of follow up after each screening visit (Table 2). Among the 855 participants in the final cohort, 16% (135/855) received a radiologist recommendation for additional imaging. A subsequent 24% (33/135) of those women received a radiologist recommendation for tissue biopsy. Finally, 18% (6/33) of biopsied women were eventually diagnosed with cancer after tissue pathology analysis. For the 6 participants diagnosed with cancer as a result of mammography screening, the final pathology yielded 2 Invasive Ductal Carcinoma (IDC) and 4 Ductal Carcinoma in Situ (DCIS).
Three participants (two control, one experimental) with normal mammography screening obtained tissue diagnosis after undergoing additional screening with MRI. Two of the biopsied findings were benign, while one (control group) resulted in a DCIS detected only on MRI and not on mammography (Online Resource 1-Supplementary Fig. 3).
Impact of the AI-modified workflow
The AI-modified workflow resulted in significantly shortened diagnostic delays (Fig. 3). In the control group, the mean TA was 25.6 days [95% CIs: 22.0–29.9] and the mean TB was 55.9 days [95% CIs: 45.5–69.6]. In comparison, mean TA in the experimental group was reduced by 25% to 19.1 days (ΔTA=−6.4 days, upper limit of one-sided 95% CI = −0.3, p<0.001), while the mean TB was reduced by 30% to 39.2 days (ΔTB=−16.8 days, upper limit of one-sided 95% CI = −5.1, p=0.003). Similar reductions were observed when comparing medians (Fig. 3). Times were further shortened for AI-prioritized participants in the experimental group: the mean TA was 86% shorter (3.5 days [95% CI 1.0–7.6]) and the mean TB was 46% shorter (30.2 days [95% CI 25.1–35.3]). This gain was primarily attributable to the fact that 73% (24/33) of AI-prioritized participants who needed additional imaging obtained it during the same visit as their screening exam.
AI-modified workflow results. All values are intervals relative to the time of screening. TA is the time to additional imaging and TB is the time to biopsy diagnosis. a [i] The mean TA with 95% CIs of the mean is shown in the control group, in the experimental group overall, and in the subsets of AI-prioritized and not prioritized participants in the experimental group. Bootstrapped effect size estimate is shown. [ii] The median TA with 95% CIs of the mean is shown in the control group, in the experimental group overall, and in the subsets of AI-prioritized and not prioritized participants in the experimental group. In the control group median TA was 22.0 days [95% CIs 21.0–27.9] vs 14.0 days [95% CIs 7.2–20.0] in the experimental group. p-value for the Mann-Whitney U test is shown. [iii] Individual TA data values are shown for participants ultimately diagnosed with breast cancer, in the control group (participants A, B, C) and in the experimental group (participants D, E, F, all AI-prioritized). b [i] The mean TB with 95% CIs of the mean is shown in the control group, in the experimental group overall, and in the subsets of AI-prioritized and not prioritized participants in the experimental group. Bootstrapped effect size estimate is shown. [ii] The median TB with 95% CIs of the mean is shown in the control group, in the experimental group overall, and in the subsets of AI-prioritized and not prioritized participants in the experimental group. The median TB was 49.0 days [95% CIs 39.2–58.1] in the control group vs 34.7 days [95% CIs 28.1–45.0] in the experimental group. p-value for the Mann-Whitney U test is shown. [iii] Individual TB data values are shown for participants ultimately diagnosed with breast cancer, in the control group (participants A, B, C) and in the experimental group (participants D, E, F, all AI-prioritized)
The reduction in diagnostic delays experienced by participants ultimately diagnosed with breast cancer was especially marked. The three cancer patients in the control group obtained additional imaging at 13, 26 and 29 days, and tissue diagnosis at 39, 57 and 58 days, respectively. In comparison, the three cancer patients in the experimental group obtained additional imaging at 0, 0 and 4 days, and tissue diagnosis at 12, 27 and 33 days, respectively.
Radiologists recalled 17.0% (79/463, [95% CI 13.8–20.8]) of cases for additional imaging in the experimental group, compared to 14.3% (56/392, [95% CI 11.0–18.1]) in the control group. This difference was not statistically significant (p=0.36). The radiologist cancer detection rate was 6.5 per 1000 (3 cancers detected in 463 participants) in the experimental group, and 7.7 per 1000 (3 cancers detected in 392 participants) in the control group.
The performance of AI prioritization compared to screening outcomes was analyzed for experimental and control cases separately, and as a combined measure. For the control group, AI results were computed for the purpose of post hoc data analysis only (Online Resource 1-Supplementary Table 1). In the whole study cohort, the AI prioritized 29% (245/855) of participants, 39% (53/135) of those who were subsequently recommended by the radiologist for additional imaging, 64% (21/33) of those recommended for tissue biopsy and 100% (6/6) of cancers. Similar proportions were measured when considering experimental and control groups separately (Online Resource 1-Supplementary Table 2). In addition to the 6 cancers diagnosed as a result of the mammography screening, the AI software identified one additional cancer in the control group--this AI finding was not presented to the radiologist (per the protocol for the control group) and the case was interpreted by the radiologist as negative. This cancer was later detected on supplemental MRI screening (Online Resource 1-Supplementary Fig. 3).
Discussion
In an environment burdened with staffing shortages and confounded by the post-COVID backlog, we have implemented an AI-modified workflow to triage breast cancer screening mammograms, resulting in a streamlined patient journey and significantly reduced time to diagnostic imaging and biopsy diagnosis. Importantly, all participants ultimately diagnosed with breast cancer were prioritized by AI and those in the experimental group obtained additional imaging within 4 days.
Fundamentally, this AI implementation approach leverages the fact that relatively few patients will require diagnostic workup post-screening and even fewer will be diagnosed with cancer. The AI serves to selectively identify those who may benefit most from immediate interpretation and does so more accurately than if radiologists had randomly selected a similar proportion of cases for prioritized review. To demonstrate this concept, we conducted a post hoc simulation comparing AI-based and random prioritization (Online Resource 1-Supplementary Simulations). The simulation showed that AI prioritization identified a substantially higher proportion of biopsies and cancers. Thus, the observed reduction in diagnostic delays is likely due to the AI-modified workflow, rather than to immediate reading only.
The importance of studying real-world implementations should be emphasized. For example, while retrospective reader studies showed that CAD was detecting more cancers than radiologists, data gathered by Lehman et al. [21] demonstrated no overall improvement in cancer detection once CAD was implemented. We similarly identified implementation challenges and insights that would not have been apparent in a retrospective setting.
First, the proportion of cases prioritized by AI for immediate radiologist review was 29%, much higher than the proportion anticipated based on retrospective testing (14%). The underlying reasons for this discrepancy are not clear. This implementation insight shows that the retrospective performance of AI may not translate to a real-world setting. For the triage use case, this meant that considerably more immediate screening interpretations were needed than initially anticipated. To address the diverse operational capacities of different clinical settings, the operating point of an AI triage system can be adjusted. Use of a more specific AI operating point (i.e., one with a more stringent threshold) could prioritize fewer examinations and thus mitigate excessive disruptions. We explored this concept in a post hoc simulation, finding that a more specific operating point that prioritized only 10% of the participants would still have selected 2 of 3 experimental group cancers (Online Resource 1-Supplementary Simulations), suggesting institutions could calibrate the AI’s prioritization rate to match available resources.
A second implementation insight was that despite the study design’s intention to prevent AI from influencing radiologist interpretation, qualitative feedback highlighted the discomfort radiologists faced in interpreting AI-prioritized cases as normal, especially initially. However, after multiple rounds of exposure to AI prioritization, radiologists realized that relying on their own expertise was equally as important. Implementation of AI explainability features beyond a binary prioritization, such as regions of interest or confidence scores, could improve AI-radiologist collaboration and decrease bias.
Finally, immediate radiologist review and same-visit additional imaging can be disruptive to traditional clinical workflows and may not be feasible for all clinical practices. While introducing a small portion of ad hoc AI-prioritized interpretations can be less efficient than batch-reading, other efficiency gains due to same-visit workup are likely (no re-interpretation ahead of diagnostic workup, less time spent scheduling and checking in the patient, etc.) and thus further study of this tradeoff is necessary. Consistent with previous publications, not all participants who were offered same-visit additional imaging accepted, which introduced additional workflow variation. Additionally, email notification of an AI-prioritized case is likely only feasible in the research setting. Methods to incorporate prioritized cases directly in a radiologist’s worklist and provide a real-time notification would require collaboration with information technology (IT) resources and electronic medical records and picture archiving and communication systems (PACS). Several commercially available AI systems have methods to prioritize cases based on complexity, and therefore incorporating real-time AI prioritization and notification could potentially be integrated into modern systems.
This investigation has limitations. Although our sample size of 1000 participants is larger than the median of 294 seen in AI healthcare trials [22], it is insufficient for robust subgroup analysis, particularly for cancers, given their low prevalence in the screening population. Additionally, the study population represents only a small proportion of the total screening population at our institution, limiting the generalizability of this AI implementation to full clinical practice.
The study was aimed at assessing an AI implementation strategy and its impact on operational outcomes, and as such was not powered to measure diagnostic accuracy of the modified workflow. Moreover, it did not directly evaluate the benefits of the AI-modified workflow in terms of reducing anxiety in patients or addressing racial disparities. Instead, we rely on previously published work, showing that immediate mammography interpretation and same-visit workup are associated with an improvement to patient experience and a reduction in inequities in timely follow-up. Finally, this study was only conducted at a single site using one AI model, and therefore the generalizability of this implementation strategy to other screening environments remains unknown.
In conclusion, our implementation study prospectively demonstrated an AI-modified workflow that was attainable in clinical practice and yielded a statistically significantly shorter time to additional imaging and biopsy diagnosis for patients undergoing screening mammography compared to the standard-of-care workflow. The broader benefits of reducing such diagnostic delays include improved patient adherence, decreased anxiety and addressing disparities in access to timely care. Additionally, introducing AI in lower resource settings where patients are often lost to follow-up care, could further amplify the importance of capturing patients with concerning imaging findings while they are still in the breast center. Consequently, this demonstration is an early but important step towards identifying an AI implementation strategy that can improve the efficiency and timeliness of breast cancer screening. Further implementation studies in diverse clinical settings are needed to assess generalizability and the potential impact on patient outcomes.
Data availability
Data are not publicly available but may contact sarah.friedewald@nm.org if interested in viewing for research purposes.
References
Siu AL, on behalf of the U.S. Preventive Services Task Force (2016) Screening for breast cancer: US preventive services task force recommendation statement. Ann Internal Med 164(4):279. https://doi.org/10.7326/m15-2886
Duffy SW, Tabar L, Smith RA (2002) The mammographic screening trials: commentary on the recent work by Olsen and Gotzsche. Cancer J Clin 52(2):68–71. https://doi.org/10.3322/canjclin.52.2.68
Daly B, Olopade OI (2015) A perfect storm: how tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change. CA Cancer J Clin 65(3):221–38
Miller-Kleinhenz JM, Collin LJ, Seidel R et al (2021) Racial disparities in diagnostic delay among women with breast cancer. J Am Coll Radiol 18(10):1384–93
Burnside ES, Park JM, Fine JP, Sisney GA (2005) The use of batch reading to improve the performance of screening mammography. AJR Am J Roentgenol 185(3):790–6
Lindfors KK, O’Connor J, Parker RA (2001) False-positive screening mammograms: effect of immediate versus later work-up on patient stress. Radiology 218(1):247–53
Dontchos BN, Achibiri J, Mercaldo SF et al (2022) Disparities in same-day diagnostic imaging in breast cancer screening: impact of an immediate-read screening mammography program implemented during the COVID-19 pandemic. AJR Am J Roentgenol 218(2):270–8
Oluyemi E (2022) Editorial comment: offering immediate screening mammography interpretation may be an effective way to reduce the racial and ethnic disparity gap in the time to diagnostic follow-up. AJR Am J Roentgenol 218(2):278
McKinney SM, Sieniek M, Godbole V et al (2020) International evaluation of an AI system for breast cancer screening. Nature 577(7788):89–94
Schaffter T, Buist DSM, Lee CI et al (2020) Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Netw Open 3(3):e200265
Salim M, Wåhlin E, Dembrower K et al (2020) External evaluation of 3 commercial artificial intelligence algorithms for independent assessment of screening mammograms. JAMA Oncol 6(10):1581–8
Rodriguez-Ruiz A, Lång K, Gubern-Merida A et al (2019) Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst 111(9):916–22
Freeman K, Geppert J, Stinton C et al (2021) Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ 374:n1872
Dembrower K, Crippa A, Colón E, Eklund M, Strand F, ScreenTrustCAD Trial Consortium (2023) Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. Lancet Digit Health 5(10):e703-11
Lång K, Josefsson V, Larsson A-M et al (2023) Artificial intelligence-supported screen reading versus standard double reading in the mammography screening with artificial intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lancet Oncol 24(8):936–44
https://cdn.clinicaltrials.gov/documents/ACT_Checklist.pdf, accessed in September 2020
Nyante SJ, Abraham L, Ej AB et al (2022) Diagnostic mammography performance across racial and ethnic groups in a national network of community-based breast imaging facilities. Cancer Epidemiol Biomarkers Prev 31(7):1324–1333
Acr. 2013 ACR BI-RADS Atlas: breast imaging reporting and data system (2014)
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Amsterdam
Illowsky B, Dean S (2017) Introductory statistics
Lehman CD, Wellman RD, Buist DSM et al (2015) Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 175(11):1828–37
Plana D, Shung DL, Grimshaw AA, Saraf A, Sung JJY, Kann BH (2022) Randomized clinical trials of machine learning interventions in health care: a systematic review. JAMA Netw Open 5(9):e2233946
Acknowledgements
The authors would like to acknowledge Amanda Kownacki for coordinating the study. Juliette Jardim for help initiating the study. Jim Taylor, Lan Huong Nguyen, Rayman Huang, Shahar Jamshy for their advice on statistical methods. Northwestern Medicine breast radiologists for reading mammograms as part of the study. Yun Liu, Dale Webster, Christopher Kelly, Michael Howell and John Hernandez for manuscript feedback. Abhilasha Mukherjee for regulatory guidance. RK Neelakandan for quality engineering guidance. Mozziyar Etemadi's team at Northwestern Medicine for their contributions to the technical and clinical aspects of this study.
Funding
This study was funded by Google and performed at Northwestern Memorial Hospital (NMH) in Chicago, Illinois.
Author information
Authors and Affiliations
Contributions
SMF, MSi, SJ, DS, SB, DG, SP, SMM, ME, SW, TS, MSe, DT, KE and SS contributed to the conception and design of the study. SMF, MSi, SJ, TK, DS, SB, DG, SC, DM, ME, AM, LS, and MSe contributed to data collection. MSi, SJ, FM, TK, DM, ME, AA, RZ, GD, SK, APK, JY, and BM contributed the AI infrastructure and analysis tooling. SMF, MSi, SJ, TK, and LS performed the analysis. SMF, MSi, SJ, and TK wrote the manuscript. SMF, ME, YM, GSC, DT, KE, and SS secured funding for the study.
Corresponding author
Ethics declarations
Competing interests
Study was funded by Google. MSi, SJ, FM, TK, SP, SMM, SW, TS, AM, AA, RZ, GD, SK, APK, JY, BM, YM, GSC, DT, KE and SS received Alphabet stock as part of their standard compensation package. SMF, DS, DG, SB received a research grant from Google. SMF is a consultant for Hologic Inc.
Ethical approval
This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Northwestern University Institutional Review Board (IRB) on August 14, 2020 (# STU00212646).
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent to publish
The authors affirm that human research participants provided informed consent for publication of the images in Figs. 3a–e.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Friedewald, S.M., Sieniek, M., Jansen, S. et al. Triaging mammography with artificial intelligence: an implementation study. Breast Cancer Res Treat 211, 1–10 (2025). https://doi.org/10.1007/s10549-025-07616-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10549-025-07616-7