PhysioHub

Sticks and stones: bias and readability assessment in LLM-generated patient education for anterior cruciate injury

Brief summary, from the abstract

In this cross-sectional study, AI chatbot–generated patient education on ACL injury was generally understandable and actionable with little sex, gender, ethnic, or socioeconomic bias, but it read at a difficult level, often missed key information, and tended to use a negative tone that could heighten patient fear.

  • Four large language models generated 40 ACL education responses across 10 personas; reading level (Flesch-Kincaid Grade Level) ranged from 9.9 (SD=0.8) to 11.4 (SD=1.5), harder than recommended for general patients.
  • 36 of 40 responses (90%) met the 70% PEMAT-P threshold for understandability and 27 (67.5%) for actionability.
  • No statistically significant language differences were found between personas across the models (p>.05), but 37 of 40 responses (92.5%) carried a negative tone.
  • This was a single cross-sectional analysis of 40 AI outputs, so findings describe these models' behavior rather than measure real patient outcomes.
Read the original paper
Clinically assessing this area? See the knee special tests.

More Knee studies