Artificial intelligence does not get it always right when responding to patients’ queries on healthcare information about cancerous diseases, scientists say.
In an article in the European Journal of Cancer, the scientists note that “challenges persist in accuracy, reference quality, and readability of health information. These issues are especially pronounced in languages other than English, where hallucinations remain a concern.”
The scientists examine seven leading AI chatbots—ChatGPT, Google’s Gemini, Microsoft’s Co-Pilot, MetaAI, Claude, Grok, and Perplexity, assessing their ability to answer common cancer-related queries in English, Arabic, French, Chinese, Thai, Hindi, Nepali, and Vietnamese.
With millions of people turning to AI chatbots for health advice, ensuring these tools provide accurate, comprehensible, and well-referenced information is critical, the scientists maintain in their study, which can be considered a wakeup call for AI companies which are assuming a growing and vital role in public healthcare.
Imagine someone turning to an AI chatbot for instant, easy-to-understand, and accurate answers, quizzes co-author Ahmad Abuhelwa, an assistant professor at Sharjah University’s College of Pharmacy. “Can they trust the responses?’ he wonders.
To answer this specific question, scientists from Flinders University (Australia), Massachusetts General Hospital/Harvard Medical School (USA), Prince of Songkla University (Thailand), and the University of Sharjah (UAE) attempt to rigorously evaluate the accuracy and reliability of generative AI chatbots in providing cancer-related information across multiple languages.
“AI chatbots are becoming an essential tool for people seeking cancer information. However, our study highlights that we must improve their accuracy, especially in non-English languages, to make them truly reliable for everyone," adds co-author Ashley Hopkins from Australia’s Flinders University.
Analyzing the answers AI chatbots give to simple cancer-related questions, the scientists acknowledge the need for better multilingual accuracy, the importance of reference quality, and the multiple challenges to access and readability. The authors’ assessment of answers relies on the criteria of accuracy, source reliability, readability, and medical guidance.
While the study finds English language responses to be fairly reliable with no major inaccuracies, they spot issues with non-English responses where “7 out of 294 answers contained errors, including mistranslations, incorrect drug names, and inappropriate treatment recommendation. Reference-quality varied with 48% of responses having valid references, and 39% of the English references were .com links reflecting quality concerns,” according to Dr. Abuhelwa.
Lead author, Bradley Menz, also from Flinders University, stresses the importance of AI developers treating their linguistically diversified audience equitably when furnishing healthcare information. "Patients and caregivers increasingly rely on AI for medical advice. Our study highlights the urgent need to improve the quality of information AI chatbots provide to ensure safe and equitable access to healthcare knowledge."
Many users see the information AI retrieves from .com links generally as authentic, but the authors maintain that these links are often considered unreliable as they may prioritize commercial interests over accuracy and scientific evidence. Unlike government (.gov) or academic (.edu) sources, .com websites are not held to stringent standards for medical accuracy, they say.
The authors praise artificial intelligence for the potential to revolutionize healthcare access, however, they call for caution because its reliability in providing safe and evidence-based cancer information cannot yet fully be trusted.
Says Dr. Abuhelwa, "Incorrect health information, particularly in cancer contexts can have serious consequences. Our research shows that while AI tools are making great progress, we must ensure they provide clear, accurate, and well-referenced health information.
“Our work underscores the need for AI regulation and continuous monitoring to prevent false health information from potentially causing harm. It is a wake-up call for AI developers – Publicly accessible AI tools must be held to the highest standards to ensure they serve the public safely and effectively, and for the benefit of all."
Among their recommendations, the scientists urge AI developers to further enhance their multilingual services to ensure patients worldwide receive correct health advice, render their AI-generated responses more user-friendly, and work more closely with healthcare professionals to refine their tools.
Dr. Abuhelwa is buoyant about the research’s findings which he says they can have “real-world applications,” particularly in AI model improvement, healthcare support, patient education and policy development.
Journal
European Journal of Cancer
Method of Research
Survey
Subject of Research
Not applicable
Article Title
Generative AI chatbots for reliable cancer information: Evaluating web-search, multilingual, and reference capabilities of emerging large language models
Article Publication Date
11-Mar-2025