Feature Story | 28-Jul-2023

Preventing bias in machine learning through data-centric modeling

Texas A&M University

image: Through her research, Dr. Na Zou is looking to reduce bias in machine learning to prevent discrimination toward certain individuals or groups in areas like health care, public services and education. view more

Credit: Texas A&M Engineering

Based on data, machine learning can quickly and efficiently analyze large amounts of information to provide suggestions and help make decisions. For example, phones and computers expose us to machine learning technologies such as voice recognition, personalized shopping suggestions, targeted advertisements and email filtering.

Machine learning impacts extensive applications across diverse sectors of the economy, including health care, public services, education and employment opportunities. However, it also brings challenges related to bias in the data it uses, potentially leading to discrimination against specific individuals or groups.

To combat this problem, Dr. Na Zou, an assistant professor in the Department of Engineering Technology and Industrial Distribution at Texas A&M University, aims to develop a data-centric fairness framework. To support her research, Zou received the National Science Foundation's Faculty Early Career Development Program (CAREER) Award.

She will focus on developing a framework from different aspects of common data mining practices that can eliminate or reduce bias, promote data quality and improve modeling processes for machine learning.

“Machine learning models are becoming pervasive in real-world applications and have been increasingly deployed in high-stakes decision-making processes, such as loan management, job applications and criminal justice,” Zou said. “Fair machine learning has the potential to reduce or eliminate bias from the decision-making process, avoid making unwarranted implicit associations or amplifying societal stereotypes about people.”

According to Zou, fairness in machine learning refers to the methods or algorithms used to solve the phenomenon that machine learning algorithms naturally inherit or even amplify the bias in the data.

“For example, in health care, fair machine learning can help reduce health disparities and improve health outcomes,” Zou said. “By avoiding biased decision making, medical diagnoses, treatment plans and resource allocations can be more equitable and effective for diverse patient populations.”

Additionally, users of machine learning systems can enhance their experiences across various applications by mitigating bias. For instance, fair algorithms can incorporate individual preferences in recommendation systems or personalized services without perpetuating stereotypes or excluding certain groups.

To develop unbiased machine learning technologies, Zou will investigate data-centric algorithms capable of systemically modifying datasets to improve model performance. She will also look at theories that facilitate fairness through improving data quality, while incorporating insights from previous research in implicit fairness modeling.

The challenge of developing a fairness framework lies in problems within the original data used in machine learning technologies. In some instances, the data may lack quality, leading to missing values, incorrect labels and anomalies. In addition, when the trained algorithms are deployed in real-world systems, they usually face problems of deteriorated performance due to data distribution shifts, such as a covariate or concept shift. Although the data can be incomplete, it is used to make impactful decisions throughout various fields.

“For example, the trained models on images from sketches and paintings may not achieve satisfactory performance when used in natural images or photos,” Zou said. “Thus, the data quality and distribution shift issues make detecting and mitigating models’ discriminative behavior much more difficult.”

If successful, Zou believes the outcome of this project will lead to advances in facilitating fairness in computing. The project will produce effective and efficient algorithms to explore fair data characteristics from different perspectives and enhance generalizability and trust in the machine learning field. This research is expected to impact the broad utilization of machine learning algorithms in essential applications, enabling non-discrimination decision-making processes and prompting a more transparent platform for future information systems.

“Receiving this award will help me achieve my short-term and long-term goals,” Zou said. “My short-term goal is to develop fair machine learning algorithms through mitigating fairness issues from computational challenges and broadening the impact through disseminating research outcomes and a comprehensive educational toolkit. The long-term goal is to extend the efforts to all aspects of society to deploy fairness-aware information systems and enhance society-level fair decision-making through intensive collaborations with industries.”

By Michelle Revels, Texas A&M Engineering

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.