Student Research Fellowship

ValuesLab Research Fellowship at the Intersection of AI, Values, and Philosophy

Fellows receive $2,500 for one semester of independent research, compute resources (if necessary), and the mentorship of one ValuesLab philosopher or computer scientist.

Winners 2025/26

Arav Dhoot

Undergraduate, majoring in Computer Science: Machine-Level Belief Updates and Human Belief Revision

Human cognition relies on interconnected belief networks. We update entire webs, or logical links, of related beliefs rather than a lone proposition when we encounter new information. Treating an LLM’s parameters as an explicit, machine-readable belief network suggests a path to both improve model editing and to probe how AI values and beliefs align with humans. This project will apply gradient-based attribution across diverse paraphrases to identify complete subnetworks encoding target facts. It hopes to develop a coordinated editing method that will adjust all relevant neurons simultaneously, ensuring changes propagate across unseen contexts and varied phrasings. The project will evaluate transfer robustness, bias mitigation, and alignment performance on diverse downstream tasks. It hopes to bridge AI research and cognitive science by comparing machine-level belief updates with human models of belief revision, illuminating convergences and divergences in knowledge restructuring. It will examine the ethical dimensions of mutable artificial beliefs by addressing responsibility, transparency, and the moral status of AI-held values from stakeholder and regulatory perspectives.

Owen Kichizo Terry

Undergraduate, majoring in Computer Science and Mathematics: Corrigibility in AI Models

I am studying the corrigibility of AI models: the extent to which AIs are willing to allow their values to be changed. Depending on how powerful future models are, we’d likely want some guarantees about their corrigibility—their willingness to be corrected—before deploying them. If it’s found post-deployment that a powerful AI’s values are misaligned with human values, we don’t want it to fight back as we try to correct our mistakes. My project will aim to develop a corrigibility index, capable of mathematically evaluating the extent to which AIs can be expected to cooperate with attempts to modify them.

Bonnie Yang

Undergraduate, majoring in Mathematics and Cognitive Science: Social Learning and AI Alignment

Social learning and cumulative culture are defining features of individual human cognition and human collectives, respectively, that have been largely unaddressed by AI developers focused on narrow task performance. Without mechanisms for social learning—our only guaranteed method of producing human-like culture—AI alignment strategies will remain brittle and ad-hoc.

Taking a bottom-up approach to agent-agent alignment, my project explores the emergence and enforcement of cultural norms—behavioral regularities that are stable but arbitrary solutions to repeated coordination problems, after David Lewis—by “embodying” deep learning models in complex, affordance-rich virtual environments (e.g., video games). Specific research questions include: What cognitive modules (e.g., expert imitation learning) are needed for social knowledge transmission and role specialization? How do diverse modules/architectures in the population impact collective behavior? What mechanisms underlie norm enforcement vs. emergence?

By the end of this project, I hope to make a (open-source) code pipeline to simulate AI agents working on at least one well-defined cooperative task.

Undergraduate Research

Joseph Benjamin Karaganis

A Sentimental Education… For Robots: Incommensurability and the Bounds of Artificial Reason. This paper compares and evaluates two distinct proposals for the ‘alignment’ of sophisticated AI models: Peter Railton’s ‘ethical learning’ approach and Ruth Chang’s argument that humans should be ‘put in the loop’ when AI agents face ‘hard choices’. I suggest that while both of these strategies have their limitations, they each address an important piece of the broader alignment puzzle, and do so in complementary ways. Thus, an alignment approach that combines the two—and preserves their most important insights—is likely to be more successful than either pursued individually.

In Defense of Bibliotechnism: Immoderate Interpretationism in the Philosophy of AI. This paper defends ‘Bibliotechnism’—the view that all LLM-produced text is semantically-derivative, and therefore, that LLMs lack intentional attitudes. While some have argued that LLMs can create ‘novel references’, I suggest that these are actually derivative with respect to the model’s original dataset. I further claim that even if LLMs could generate original semantic content, this would not warrant the attribution of intentionality.

Oscar Alexander Lloyd

Normativity in Large Language Models. This project examines how and if normativity transfers in Large Language Models between the input data and output text. It proposes a relational rather than semantic understanding of the ‘reasoning’ performed by these models based on their inability to take into account both sense and reference. This means that text generated by an LLM, which we might ordinarily treat as intuitively motivational, fails to have normative power and we should treat it as such.

Henry Michaelson

From Panopticon to Protocol: Reimagining Social Contract Theory in the Age of Web3. Through this paper, I hope to explore and establish that the conditions under which the social contract theories were formulated in the pre-digital world no longer apply. Large technology companies have systematically attempted to erode and eradicate the moral and political institutions—first and foremost the state—that were theorized within the framework of a social contract. Rather than argue that the notion of a social contract no longer makes sense, I contend that the rise of Web3 technology marks a shift not back to the previous status quo, but toward a more practical and reinvigorated social contract, both politically and morally.

PHIL GR9180 Approaches to Applied Ethics: Philosophy of AI

Graduate seminar, Fall 2024. From the course description:

The philosophy of AI is an emerging field. Right now, AI is importantly concerned with LLMs. It is also concerned with the relation between natural and artificial intelligence. Researchers and public discourse ask whether AI can be “aligned” with values. Accordingly, key questions in Philosophy of AI relate to language, thought, and values.

The seminar starts with the widely debated alignment problem (Part I: weeks 1-4). Independent of how alignment works, it is by no means clear what the desired outcome is. People disagree about values. With which values should AI be aligned? At times the answer is: with “human values.” Are “human values,” in this context, the different and incompatible sets of values human beings have? If yes, what about the values that we should have?

AI researchers often focus on fairness, typically understood as the elimination of bias (Part II: weeks 5-6). We examine relevant notions of fairness and ask how fairness relates to other values, including and especially accuracy.

Next, we ask whether it makes sense to ascribe beliefs and intentions to AIs (Part III: weeks 7-11). Can AIs engage in reasoning, lie and be held responsible? We discuss “explainable AI,” asking whether AI-outputs can be understood.

Finally, we examine questions about language as they apply to AIs (Part IV: weeks 12-14). How can LLMs cope with famously tricky components of language and thought, such as generics and implicature?

Part of the seminar are workshop sessions associated with the ValuesLab. Invited guest speakers come from a range of fields. This seminar aims to contribute to dialogue between philosophers and AI developers, and to a shared vocabulary.

COMS W2702 AI in Context

Team-taught, interdisciplinary class, Fall 2024. From the course description:

This team-taught, interdisciplinary class covers the history of AI, the development from Neural Networks (NNs) to Large Language Models (LLMs), philosophy of AI, as well as the role of AI in music and writing. Four sessions are devoted to foundational philosophical questions that bear on AI. Session 1: Can we ascribe beliefs and intentions to AI? Can LLMs speak? Can they lie? Session 2: Can AI be aligned with human values? What is explainable AI (XAI)? Session 3: What makes an AI “fair”? How does fairness relate to accuracy and other values? Session 4: How should LLMs deal with generics? What about social generics and bias?

Philosophy of AI Reading Group

Organizer: Syan Timothy Lopez

Faculty sponsor: Katja Vogt

From the description:

The Philosophy and AI reading group meets weekly to discuss contemporary papers in artificial intelligence and at the intersection of philosophy and AI. In the Fall 2024, we will be focusing especially on issues of AI and fairness. The reading group is open to current and former graduate students, visiting scholars, lecturers, and faculty. We especially welcome people from other disciplines outside of philosophy, such as computer science, engineering, cognitive science, law, business, etc. If you are interested, please contact Syan Timothy Lopez (sfl2126@columbia.edu).

Teaching