Study reveals ‘AI doctors’ struggle with real-world communication challenges

Artificial intelligence tools like ChatGPT have shown promise in healthcare, assisting with tasks such as triaging patients, taking medical histories, and suggesting preliminary diagnoses.

However, a study by Harvard Medical School and Stanford University, published in Nature Medicine, highlights gaps in their performance during realistic doctor-patient interactions. 

Using the newly developed CRAFT-MD (Conversational Reasoning Assessment Framework for Testing in Medicine), researchers tested four AI models with 2,000 clinical scenarios across 12 medical specialties.

Findings showed that while AI models performed well on multiple-choice questions, their accuracy declined significantly in open-ended, conversational scenarios. They often struggled to ask critical questions, missed key information, and failed to synthesize scattered details into accurate diagnoses. 

These models also performed poorly when handling the dynamic back-and-forth exchanges typical of real-world consultations.

The study recommends incorporating unstructured conversational data into AI training, ensuring models can extract and prioritize essential information. Developers should aim to integrate diverse data types, including text, images, and EKGs, while enhancing AI’s ability to interpret non-verbal cues like tone and expressions. 

Evaluation tools like CRAFT-MD, which combine AI-driven and human assessments, could accelerate testing and improve model reliability. 

By bridging the gap between testing environments and real-world applications, researchers believe frameworks like CRAFT-MD can advance the ethical and effective deployment of AI in clinical settings.

Share this Post:

Accessibility Toolbar