Researchers in three different disciplines at Virginia Tech are partnering in a $15 million grant from the National Science Foundation (NSF) to establish an institute in the new field of “imageomics,” aimed at creating a new frontier of biological information using vast stores of existing image data, such as publicly funded digital collections from national centers, field stations, museums, and individual laboratories.
The goal of the institute is to characterize and discover patterns or biological traits of organisms from images and gain insights into how function follows form in all areas of biology. It will expand public understanding of the rules of life on Earth and how life evolves.
Imageomics is one of five Harnessing the Data Revolution institutes receiving support from the NSF.
Anuj Karpatne, assistant professor in the Department of Computer Science and faculty at the Sanghani Center for Artificial Intelligence and Data Analytics, is serving as one of four co-investigators for the multi-university project led by the Ohio State University. Leanna House, associate professor in the Department of Statistics and faculty at the Sanghani Center, and Josef Uyeda, assistant professor in the Department of Biological Sciences, are designated senior personnel. All three researchers are part of the executive leadership team of the institute and investigators on Virginia Tech’s $1.4 million portion of the grant.
For the project, Karpatne is using his expertise in knowledge-guided machine learning (KGML) to develop novel machine learning architectures for extracting and interpreting biological traits from images that will bring the revolution of computing to image-based biology. A previously-funded NSF grant is serving as a crucial building block for this work.
Currently, Karpatne said, researchers use millions of images of biological specimens collected by scientists; captured by drones, camera traps, and citizen scientists; or posted by tourists on social media to record the vast array of biodiverse organisms on the planet.
“While these provide a tipping point to unlocking novel trait information from biological images, current ‘black-box’ machine learning is not fully equipped to extract meaningful information from these kinds of images,” Karpatne said.
“We will inject structured biological knowledge in machine learning frameworks to obtain explanations from images that are both generalizable and scientifically meaningful. Just as large-scale image datasets have fundamentally transformed computer vision, our goal is to use what we have learned in the emerging field of KGML to open the next frontier of biological discoveries,” said Karpatne. He also is leading the convergence working group on knowledge-guided machine learning within the institute to bring together computer scientists and biologists and cross-pollinate ideas across disciplines.
House, whose research focuses on the intersection of human computer interaction and statistics, will direct the institute’s education and community effort. Collaborating with Daniel Rubenstein from Princeton University and a newly formed committee, she will be responsible for coordinating education and outreach programs across all of the participating institutions and overseeing efforts to build an expansive, inclusive imageomics community.
“Images are data that everyone can relate to. Our goal is to combine human experiences with images and machine learning/statistical methods in a way that introduces imageomics as a common, accepted, and accessible science, similar to the way genomics has successfully infiltrated research agendas, mass media, and education curricula from primary to graduate school,” House said.
Uyeda, who received an NSF CAREER grant in 2020, is serving as the lead for a trait-based biology convergence working group in the institute.
Uyeda uses the evolutionary tree of life to make sense of how and why traits evolve over millions of years. Experts can look at highly modified structures that are almost absent or highly elaborated and deduce how that trait developed and evolved based on its location on the body, its relationship to other traits, and knowledge of where that species lies on the tree of life even when the image is of a completely new, never-before-seen species, he said.
“But because computers are so much more sensitive than the human eye, knowledge-guided machine learning approaches are an exciting addition to our toolkit,” Uyeda said. “If we can train KGML methods to use this biological knowledge like experts do, it would open up new ways of studying organismal form and function.”
The NSF grant also is providing opportunities for graduate student research. Among the Ph.D. students who will work on the project are Mohannad Elhamod and M. Maruf, advised by Karpatne at the Sanghani Center, and Bailey Howell and Nicholas Bone in biological sciences, advised by Uyeda.