Duke Researcher Advances New Approaches to Making AI Safer and More Accountable

Brandon Fain speaking at the Conference on Society-Centered AI

As artificial intelligence systems increasingly influence decisions in healthcare, education, and public services, a growing concern is not just what these tools can do, but whether they reflect the values of the people they affect. Duke computer scientist and Society-Centered AI (SCAI) affiliate, Brandon Fain, is working to address that challenge, developing new approaches that make AI systems safer, more accountable, and more transparent about the principles guiding their behavior.

This June, Fain and a team of collaborators will present two research papers at the Association for Computing Machinery’s Conference on Fairness, Accountability, and Transparency, or (ACM FAccT), one of the world’s leading venues for research on the societal impacts of algorithmic systems. The conference brings together a very interdisciplinary group of scholars. “While everyone agrees that interdisciplinarity is important in principle, it’s rare to see scholarly communities really doing the hard work of putting that into practice,” Fain said. “At FAccT, computer scientists, social scientists, and humanists actually sit around the table together to discuss our work.”

In the broader AI research community, these questions are often gathered under what is known as the Alignment Problem – a term popularized by Brian Christian’s 2020 book of the same name. At its core, the alignment problem asks how to design artificial intelligence systems whose goals, reasoning, and behavior reliably conform to human intentions and values. It is a deceptively simple question with far-reaching consequences, and one that resists purely technical solutions. Addressing it requires insight not only from computer science, but also from philosophy, social science, law, and the humanities.

The two papers introduce complementary tools for the field. Together, they explore how AI systems can learn ethical principles directly from people’s values and then use those principles to evaluate and improve their own behavior without costly retraining. For Duke’s Society-Centered AI initiative, which aims to align technical innovation with social and ethical insight, the work highlights how theory-driven research can lead to practical advances in trustworthy AI.

One of the papers, Learning Alignment Principles Grounded in Human Reasons and Values, challenges a core assumption behind many modern AI systems. Today’s models are often trained using preference data – simple signals about which output a person favors over another. While efficient, that approach obscures the reasons behind people’s judgments, including ethical concerns such as fairness, respect, and safety. “Ethical AI has to go beyond what users like from moment to moment,” Fain said. “It has to include what people think is right in general. We want the best of ethical reflection, not just our immediate reactions.”

Fain and his coauthors argue that those reasons carry the moral substance of human values. To capture them, the team developed a method called Grounded Constitutional AI, which learns guiding principles for AI systems from two sources: the explanations people give for their choices in specific situations and the broader values they say AI systems should uphold in general. Across thousands of responses, the method clusters and summarizes these inputs to produce a compact set of principles – what researchers refer to as a constitution – that can guide AI behavior.

In user studies, participants consistently rated these constitutions as more ethical, more coherent, and more representative of public concerns than those produced using preference data alone. This distinction matters most in sensitive contexts, such as mental health support or education, where tone, respect, and avoiding harmful assumptions can be just as important as factual correctness.

The second paper, Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale, addresses a different but equally pressing challenge. Even when AI systems are given a set of ethical principles, ensuring they consistently follow them has traditionally required expensive retraining with extensive human feedback. That process places responsible AI development out of reach for many institutions outside major technology companies.

Reflect offers an alternative. Instead of retraining models, the approach prompts an AI system to reason through its own outputs after they are generated. First, the AI evaluates its response against each guiding principle. If it detects violations – such as bias, unsafe advice, or disrespectful language – it critiques itself and revises the response accordingly. The entire process is automated and transparent, allowing observers to see how principles are applied.

Across multiple models and datasets, the researchers found that Reflect significantly reduced serious ethical violations while preserving factual accuracy. The method was particularly effective at catching rare but high-impact failures, the kinds that often undermine public trust. “While most alignment research focuses on optimizing for average preferences or benchmark ‘win rates’,” Fain said, “it’s incredibly important for safety to attend to the ‘long-tail’ of rare but serious violations of principles.” Because Reflect works with different principal sets, it also allows schools, hospitals, nonprofits, and public agencies to adapt AI systems to their own community-defined values without specialized infrastructure.

Together, these two papers can be read as addressing different facets of the alignment problem. If the first asks how AI systems can learn principles that genuinely reflect human values, the second asks how those principles can be put into practice reliably and transparently at scale. Seen this way, Fain’s work highlights alignment not as a single technical hurdle, but as an ongoing process—one that spans how values are elicited, represented, and enforced in real-world systems.

Although Fain’s training is grounded in algorithms and machine learning, his research agenda is driven by human concerns. His work spans fairness, moral reasoning, interpretability, and uncertainty. These are all questions that become increasingly urgent as AI systems move from controlled laboratory settings into public-facing roles. He is also committed to involving undergraduate students in research early, helping to train future technologists who consider ethical and social implications alongside computational performance. Fain is coordinating Duke’s AI and Humanity first‑year constellation and teaching CS 172CN: Moral AI and Algorithmic Justice, alongside CS 372: Introduction to Applied Machine Learning. With others, he is planning a course called Society-Centered AI 101, which will be a cornerstone in Duke’s AI educational landscape.

As Duke continues to expand its work in responsible AI through the Society-Centered AI Initiative, many SCAI affiliates are producing research underscoring the importance of designing systems that remain accountable to the people and communities they impact. Rather than treating ethics as an afterthought, research like Fain’s shows how values can be embedded directly into AI behavior and evaluated in real time.

At a moment when AI is rapidly reshaping everyday decision-making, these contributions offer a concrete path toward systems that do not simply optimize for efficiency but actively reflect human values. Building AI worthy of trust, Fain’s work suggests, begins by listening carefully to people and giving machines the tools to do the same.