News Release

Can we trust scientific discoveries made using machine learning?

Rice U. expert: Key is creating ML systems that question their own predictions

Peer-Reviewed Publication

Rice University

Genevera Allen

image: Rice University statistician Genevera Allen will discuss research to improve the accuracy and reproducibility of scientific discoveries made by machine learning in both a press briefing and general session at the 2019 AAAS Annual Meeting. view more 

Credit: Photo by Tommy LaVergne/Rice University

WASHINGTON -- (Feb. 15, 2019) -- Rice University statistician Genevera Allen says scientists must keep questioning the accuracy and reproducibility of scientific discoveries made by machine-learning techniques until researchers develop new computational systems that can critique themselves.

Allen, associate professor of statistics, computer science and electrical and computer engineering at Rice and of pediatrics-neurology at Baylor College of Medicine, will address the topic in both a press briefing and a general session today at the 2019 Annual Meeting of the American Association for the Advancement of Science (AAAS).

"The question is, 'Can we really trust the discoveries that are currently being made using machine-learning techniques applied to large data sets?'" Allen said. "The answer in many situations is probably, 'Not without checking,' but work is underway on next-generation machine-learning systems that will assess the uncertainty and reproducibility of their predictions."

Machine learning (ML) is a branch of statistics and computer science concerned with building computational systems that learn from data rather than following explicit instructions. Allen said much attention in the ML field has focused on developing predictive models that allow ML to make predictions about future data based on its understanding of data it has studied.

"A lot of these techniques are designed to always make a prediction," she said. "They never come back with 'I don't know,' or 'I didn't discover anything,' because they aren't made to."

She said uncorroborated data-driven discoveries from recently published ML studies of cancer data are a good example.

"In precision medicine, it's important to find groups of patients that have genomically similar profiles so you can develop drug therapies that are targeted to the specific genome for their disease," Allen said. "People have applied machine learning to genomic data from clinical cohorts to find groups, or clusters, of patients with similar genomic profiles.

"But there are cases where discoveries aren't reproducible; the clusters discovered in one study are completely different than the clusters found in another," she said. "Why? Because most machine-learning techniques today always say, 'I found a group.' Sometimes, it would be far more useful if they said, 'I think some of these are really grouped together, but I'm uncertain about these others.'"

Allen will discuss uncertainty and reproducibility of ML techniques for data-driven discoveries at a 10 a.m. press briefing today, and she will discuss case studies and research aimed at addressing uncertainty and reproducibility in the 3:30 p.m. general session, "Machine Learning and Statistics: Applications in Genomics and Computer Vision." Both sessions are at the Marriott Wardman Park Hotel.

###

Allen is the founding director of Rice's Center for Transforming Data to Knowledge (D2K Lab) and a member of the Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital. Her research lies in the areas of modern multivariate analysis, graphical models, statistical machine learning and data integration, with a particular focus on statistical methods that help scientists make sense of "big data" from high-throughput genomics, neuroimaging and other applications. Her previous honors include a National Science Foundation CAREER award, the International Biometric Society's Young Statistician Showcase award and Forbes '30 under 30' in science and health care.

AAAS is the world's largest multi-disciplinary science society, and the AAAS Annual Meeting, Feb. 14-17, is the world's largest general scientific gathering. For more information, visit: https://aaas.org.

2019 AAAS Annual Meeting:

About the meeting: https://meetings.aaas.org/about-the-meeting/?

Program: https://meetings.aaas.org/program/?

Newsroom: https://www.eurekalert.org/aaasnewsroom/2019/

Media Registration: https://www.eurekalert.org/aaasnewsroom/2019/registration/?

Media Contacts: https://www.eurekalert.org/aaasnewsroom/2019/contacts/

Allen's research: http://www.stat.rice.edu/~gallen/index.html

Rice's D2K Lab: https://d2k.rice.edu/about/d2k-lab

High-resolution IMAGE is available for download at:

https://news.rice.edu/files/2019/02/0211_AAAS-Allen01a-lg-yvriyj.jpg?CAPTION:

Genevera Allen is a Rice University statistician, data scientist and the founding director of Rice's D2K Lab. (Photo by Jeff Fitlow/Rice University)

This release can be found online at news.rice.edu.

Follow Rice News and Media Relations via Twitter @RiceUNews.

Located on a 300-acre forested campus in Houston, Rice University is consistently ranked among the nation's top 20 universities by U.S. News & World Report. Rice has highly respected schools of Architecture, Business, Continuing Studies, Engineering, Humanities, Music, Natural Sciences and Social Sciences and is home to the Baker Institute for Public Policy. With 3,962 undergraduates and 3,027 graduate students, Rice's undergraduate student-to-faculty ratio is just under 6-to-1. Its residential college system builds close-knit communities and lifelong friendships, just one reason why Rice is ranked No. 1 for lots of race/class interaction and No. 2 for quality of life by the Princeton Review. Rice is also rated as a best value among private universities by Kiplinger's Personal Finance. To read "What they're saying about Rice," go to http://tinyurl.com/RiceUniversityoverview.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.