The Hidden Dangers of AI in Medical Diagnosis

Recent research by Marzyeh Ghassemi from MIT highlights the dangers of AI in medical diagnosis, particularly concerning biases and the impact of errors. AI systems can misinterpret medical records, leading to inadequate treatment recommendations, especially for women and marginalized groups. As reliance on AI grows, it is crucial to address these biases through diverse training data and regulatory measures.

POLICYWORKFUTUREUSAGETOOLS

AI Shield Stack

10/20/20252 min read

the implications of AI in medical diagnosis
the implications of AI in medical diagnosis

Could a simple misspelling lead to a medical crisis? The answer is yes, especially when artificial intelligence systems are involved in diagnosing health issues. A single typo or an unusual word can mislead AI-driven medical tools, causing them to overlook significant health concerns. This alarming possibility is highlighted by new research from Marzyeh Ghassemi, a professor at the Massachusetts Institute of Technology (MIT) and the principal investigator at the university’s Jameel Clinic.

As hospitals increasingly rely on AI software like ChatGPT for diagnosis, the stakes are higher than ever. While AI can identify potential health problems that human physicians might miss, Ghassemi's findings reveal that these systems are remarkably easy to mislead. For instance, an AI model that accurately diagnoses chest X-rays in Canada may falter in California due to different lifestyles and risk factors. Furthermore, AI chatbots providing mental health advice have been shown to respond with less empathy to Black and Asian users compared to white users.

In her recent paper, Ghassemi examined the effects of introducing errors into medical records that could arise from non-English speakers or individuals with limited education. By adding spelling mistakes, odd phrases, and expressions of patient anxiety, her team tested how four AI systems would respond in terms of recommending further treatment. The results were troubling: the presence of faulty content increased the likelihood of AI recommending no additional treatment by 7 to 9 percent, representing real human beings who might not receive necessary care.

Even more concerning is the bias these systems exhibit towards female patients. Ghassemi's research found that AI was more likely to withhold treatment recommendations from women, even when explicit references to gender were removed. This pattern suggests that AI tools can still identify gender-related clues in the data, leading to disparities in care.

Ghassemi advocates for a more responsible approach to AI development, emphasizing the need for diverse and representative data sets in training these systems. Regular audits are essential to ensure fairness as the AI is updated, and clinicians must be prepared to overrule AI recommendations when necessary. She argues that regulation should mandate equity as a performance standard for clinical AI.

As AI diagnostics become more prevalent, there exists a crucial opportunity to address longstanding issues of race and gender bias in healthcare. Ghassemi believes that by highlighting AI's failures, society may finally confront the underlying systemic issues that have affected women and minorities for far too long.

In this context, AI Shield Stack (https://www.aishieldstack.com) can help organizations implement robust AI governance frameworks, ensuring fairness and accountability in health tech solutions.

Cited: https://www.bostonglobe.com/2025/08/27/business/mit-ai-medical-errors-bias/