Feature Story | 23-Oct-2024

Diving deeper into big data

University of Virginia College and Graduate School of Arts & Sciences

The world is swimming in data, but many researchers lured by the promise of Big Data, are still just skimming the surface.

This fall, associate professor of statistics Martin Slawski will launch a project focused on the developing software for analyzing and linking disparate data sets in a way that will provide researchers and others with more effective options for exploring Big Data’s potential.

The project, funded by the National Science Foundation’s Office of Advanced Cyber Infrastructure, provides Slawski, his partners at Brown and Johns Hopkings University, and their  graduate students with $600,000 over the next three years to develop a new approach to combining data from different sources, like survey data  and administrative data, into more comprehensive data sets using inexact identifiers such as names, demographics, addresses, or even more ambiguous commonalities.

With some estimates placing the amount of data that humans create at around 2.5 quintillion bytes every day, the challenge of using the data is becoming vastly more challenging, but Slawski, whose background is in both statistics and computer science, has already applied his methods to the problem of identifying systematic biases in the criminal justice system by comparing data generated at different stages of the criminal justice process, and he is planning to use it to help the National Institutes of Health’s National Institute of Aging better understand the healthcare outcomes of programs like Meals on Wheels.

“Often you don’t have all the information you need in a single file, so rather than recollect that information, which is costly, the idea is to use the information you have and link it,” Slawski said.

A statistician and a computer scientist, Slawski brings expertise in both fields to his work in making data analysis more accurate in a way that will produce findings with greater validity and minimize security and privacy risks.

“Both fields have very different perspectives on data analysis, but when you bring them together, they can be very complementary,” Slawski said. “It’s important to see the problem from both angles.”

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.