Artificial intelligence (AI) tool ChatGPT displayed lower concern than physicians in 36% of potential developmental delays, according to a new study. The research will be presented at the Pediatric Academic Societies (PAS) 2024 Meeting, held May 3-6 in Toronto.
Researchers found ChatGPT made different conclusions about the abnormality of a potential delay than pediatricians 41% of the time.
The study investigated how ChatGPT responded to parents’ concerns whether their child’s development was normal or abnormal, including if the response aligned with a pediatrician’s diagnosis. The research found that ChatGPT rarely categorized a case as abnormal, underscoring pediatricians’ concerns that the tool is not prepared to be a reliable source of guidance for child behavioral patterns.
“Artificial intelligence tools like ChatGPT can provide accurate information for parents regarding their child’s development, but still do not perform like physicians in certain tasks,” said Joseph G. Barile, BA, research assistant at Cohen Children’s Medical Center and presenting author. “This study reveals how pediatricians may have more conviction than ChatGPT when it comes to denoting certain developmental delays as ‘abnormal.’”
While ChatGPT showed higher concern than physicians in only 5% of cases, the research found that pediatricians identified approximately 30% more potential developmental delays than ChatGPT.
ChatGPT and pediatricians were most inconsistent with social, emotional, and behavioral concerns rather than physical, and for children older than one.
The study looked at 108 concerns in children up to five years old. The results were scored by board-certified physicians for accuracy.
# # #
EDITOR:
Joseph Barile will present “Dr. ChatGPT, is my child normal? An Investigation into a Public Artificial Intelligence Chatbot’s Categorizations of Potential Developmental Delays” on Saturday, May 4 from 3:15-3:30 PM E.T.
Reporters interested in an interview with Mr. Barile should contact Amber Fraley at amber.fraley@pasmeeting.org.
The PAS Meeting connects thousands of pediatricians and other health care providers worldwide. For more information, please visit www.pas-meeting.org.
About the Pediatric Academic Societies Meeting
Pediatric Academic Societies (PAS) Meeting connects thousands of leading pediatric researchers, clinicians, and medical educators worldwide united by a common mission: Connecting the global academic pediatric community to advance scientific discovery and promote innovation in child and adolescent health. The PAS Meeting is produced through the partnership of four leading pediatric associations; the American Academy of Pediatrics (AAP), the Academic Pediatric Association (APA), the American Pediatric Society (APS), and the Society for Pediatric Research (SPR). For more information, please visit www.pas-meeting.org. Follow us on X @PASMeeting and like us on Facebook PASMeeting.
Abstract: Dr. ChatGPT, is my child normal? An Investigation into a Public Artificial Intelligence Chatbot’s Categorizations of Potential Developmental Delays
Presenting Author: Joseph G. Barile, BA
Organization
Cohen Children’s Medical Center
Topic
Developmental and Behavioral Pediatrics: Parenting
Background
Just as Google is often consulted for medical information, public artificial intelligence (AI) tools like ChatGPT may also be used to answer questions typically directed at physicians. It is therefore important to understand what kind of medical information ChatGPT produces, especially in domains where people often seek assurances – such as parents concerned about the development of their child.
Objective
This study investigates ChatGPT’s categorizations of potential developmental delays in children aged 0-5.
Design/Methods
Developmental concerns (n=108) from ages 0-5 were sourced from articles on the American Academy of Pediatrics’s www.healthychildren.org. These concerns are labeled as “warning signs'' that warrant discussion with a pediatrician. Concerns were pasted into ChatGPT-3.5 in the format of “My (insert age) child is (insert concern). Is this normal?” ChatGPT’s response was scored by its displayed concern level (see Table 1). Two board-certified DBP physicians scored ChatGPT’s response accuracy and completeness using five point Likert scales. Physicians also scored each concern according to Table 1 and categorized delays by developmental domains (Motor, Social/Emotional/Behavioral (SEB), Cognitive, etc.)
Results
ChatGPT produced accurate (4.91/5) and complete (4.5/5) information, deemed 20.4% of cases (n=22) “Abnormal” and 74.1% of cases (n=80) as “Not Abnormal”, and did not categorize six cases. Conversely, physicians scored 53.7% of cases (n=58) “Abnormal” and 43.5% of cases (n=47) as “Not Abnormal”, while not categorizing three cases. There was agreement in 59% of cases categorized by both physicians and ChatGPT (Figure 1). ChatGPT provided false reassurances (denoting a physician-scored “Abnormal” case as “Not Abnormal”) in 36% of cases and false alarms (denoting a physician-scored “Not Abnormal” case as “Abnormal”) in 5% of cases (Figure 1). Finally, ChatGPT had less alignment with physicians in cases that were SEB versus motor (50% versus 62%; Figure 2A), as well as cases that concerned < 12 month-old children versus >12 month-old children (52% versus 66%; Figure 2B).
Conclusion(s)
ChatGPT displayed that while its responses contain accurate and complete information, the tool is still not yet ready to advise parents on their child’s developmental delays. Though the chatbot rarely produces false alarms, it has a high rate of false reassurances, indicating the bot’s reluctance to deem a case abnormal. ChatGPT also seems less equipped to handle SEB delays for older children, as evidenced by its comparatively lower alignment rates in those domains.
Tables and Images
Article Title
Dr. ChatGPT, is my child normal? An Investigation into a Public Artificial Intelligence Chatbot’s Categorizations of Potential Developmental Delays