Uncovering Responsible AI’s Biggest Challenge: Privacy and Fairness

January 13, 2025 • 4 min read

Graham Erickson Staff Machine Learning Developer

Technical breakthroughs in recent years have exposed ways to train machine learning models that combat the influence of human bias and give fair predictions for all. The primary theme in all aspects of fairness and bias work is the assumption that individual demographic data exists and can be accessed during the model training phase.

The assumptions necessary for effective debiasing often conflict with privacy and data protection principles. This creates a critical challenge in responsible AI (RAI) development, posing risks and presenting a concern for every AI developer.

Why Bias Analysis Matters

AI systems have become a common part of daily life. From AI-powered assistants integrated into common software platforms (e.g., ChatGPT-Siri integration) to machine learning optimizing e-commerce recommendations (e.g., Amazon recommendation system), AI is a key element of many digital experiences. Organizations worldwide are investigating ways to use their data to improve operations, reduce costs, and increase revenue. AI offers new avenues for data use, enabling automated trend analysis and future predictions.

As with most emerging technologies, potential benefits come with potential risks. Companies developing AI systems need policies and processes to avoid risks and ensure AI has the intended positive impact on society. AltaML has developed seven key principles to guide the ethical development of AI, including a fairness principle ensuring AI systems do not discriminate based on protected characteristics.

AI systems have the risk of exhibiting discriminatory behavior. Without intervention, AI training algorithms may learn and perpetuate biases in data, leading to discriminatory predictions. For example, the criminal recidivism system COMPAS misclassified Black defendants as re-offenders at twice the rate of white defendants. Similarly, Amazon’s attempt to create a recruiting tool using historical data inadvertently discriminated against female applicants, reflecting the historical gender imbalance in the tech industry and the ability of machine learning algorithms to exploit correlations between variables.

Fortunately, toolkits are available to allow developers to create AI systems while mitigating the risk of harmful discrimination. Techniques include statistical tests and metrics for identifying bias, visualizations for explaining the effect of bias, and machine learning algorithms to augment training to produce fair results. These techniques are considered easy to implement and complement approaches that machine learning developers are already familiar with.

Privacy Risks in Bias Analysis

On paper, derisking a use case for fairness seems simple. In practice, programming with fairness in mind is still quite difficult. Every step taken to effectively debias models requires access to variables that describe protected characteristics. For example, to assess and mitigate the risk of racial discrimination, developers must know the race of individuals in the datasets.

In order to understand where the challenges come from, the principle of “Privacy and Data Protection” must be acknowledged. This principle calls for the protection of individual privacy throughout all stages of AI development and application. Use cases that make predictions regarding individuals and individual characteristics ultimately use data about the individual and, therefore, can present privacy risks.

There are various regulatory reasons accessing demographics for debiasing is not always possible. The General Data Protection Regulation (GDPR) in the European Union and the California Consumer Privacy Act (CCPA) in the U.S. protect consumer privacy rights regarding digital products and services, setting the groundwork for global best practices. These frameworks mandate that consumers provide data only for intended purposes, with clear consent and the ability to delete or remove sensitive information. Specific industries that collect sensitive data may have their own regulations and best practices. For example, in the U.S., the Health Insurance Portability and Accountability Act (HIPAA) restricts the movement and usage of health care data, including demographics.

Government institutions, organizations, and corporations that develop AI systems with data collected internally or by a third party will find that accessing demographic information to debias an AI system is at odds with privacy regulations and best practices meant to protect individual privacy. Companies will find that there is no established mechanism for accessing demographic information in a way that respects the principle of privacy and allows AI system development to protect against risks regarding fair service outcomes. How does this apparent contradiction shape responsible AI (RAI) as an emerging field, and how can we ethically advance it?

Call for Legal Solutions

The friction between privacy and fairness leaves practitioners of RAI in a challenging position. When faced with a choice between upholding privacy and managing fairness risks, companies likely fall into one of two types:

Companies which put RAI at the top of their operating mandate will likely simply avoid working on use cases that contain significant fairness and bias risks.
Since privacy regulations are common and come with high fines, a company without an RAI policy is likely to work on and deploy AI systems without mitigating fairness and bias risks.

Between both scenarios, individuals are left unprotected. In order to protect and avoid stifling innovation in AI, the world needs legal structures that enable the evaluation of fairness in AI systems:

Laws banning discrimination of individuals from automated systems need to exist, similar to how discrimination over protected characteristics between individuals is illegal in many countries.
Systems need to exist that allow the evaluation of an AI system on the basis of fairness and discrimination.

The enablement of fairness assessments on AI systems requires an ecosystem of regulations, digital systems, citizen awareness, auditors, and capable practitioners. An ideal solution would have the following properties:

Marginalized groups control their own data.
Demographics are collected solely for the purpose of ensuring fairness in AI systems.
Evaluation results are publicly available, ensuring transparency for any consumer of an AI system, whether a business or an individual.

Interim Techniques for Machine Learning Practitioners

Despite challenges, there is still a path ahead for AI practitioners to advance RAI forward. Although not fully established, a promising angle to explore involves utilizing location data and spatial aggregation. In fairness and bias analysis, variables can be proxies for demographics. Commonly, this is seen as a risk, but potentially there could be a way to use proxy variables as stand-ins for demographic data to suggest model fairness. One powerful proxy variable is location.

High level approach:

Determine a spatial granularity level in which location data is not personally identifying.
Overlay model output error on a map.
Observe non-uniformities in spatial distribution.
Compare non-uniformities in model distribution to known non-uniformities in demographic distributions (e.g., census).

Conclusion

Privacy and fairness are two important principles in computer ethics that have more relevance than ever with the proliferation of AI, each with unique risks. Without prioritizing one principle over another, developers of AI systems will find it difficult to proceed ethically with high-risk use cases. The rising issues affect both individuals and companies, presenting an increasingly critical challenge for RAI as a whole to overcome. As a society, we need to figure out a regulatory and commercial ecosystem that can support the requirements of RAI at scale, while enabling and enforcing best practices regarding privacy and fairness.

Insights