Student Research Fellowship
ValuesLab Research Fellowship at the Intersection of AI, Values, and Philosophy
Fellows receive $2,500 for one semester of independent research, compute resources (if necessary), and the mentorship of one ValuesLab philosopher or computer scientist.
Winners 2025/26
Arav Dhoot
Undergraduate, majoring in Computer Science: Machine-Level Belief Updates and Human Belief Revision
Human cognition relies on interconnected belief networks. We update entire webs, or logical links, of related beliefs rather than a lone proposition when we encounter new information. Treating an LLMâs parameters as an explicit, machine-readable belief network suggests a path to both improve model editing and to probe how AI values and beliefs align with humans. This project will apply gradient-based attribution across diverse paraphrases to identify complete subnetworks encoding target facts. It hopes to develop a coordinated editing method that will adjust all relevant neurons simultaneously, ensuring changes propagate across unseen contexts and varied phrasings. The project will evaluate transfer robustness, bias mitigation, and alignment performance on diverse downstream tasks. It hopes to bridge AI research and cognitive science by comparing machine-level belief updates with human models of belief revision, illuminating convergences and divergences in knowledge restructuring. It will examine the ethical dimensions of mutable artificial beliefs by addressing responsibility, transparency, and the moral status of AI-held values from stakeholder and regulatory perspectives.
Owen Kichizo Terry
Undergraduate, majoring in Computer Science and Mathematics: Corrigibility in AI Models
I am studying the corrigibility of AI models: the extent to which AIs are willing to allow their values to be changed. Depending on how powerful future models are, weâd likely want some guarantees about their corrigibilityâtheir willingness to be correctedâbefore deploying them. If itâs found post-deployment that a powerful AIâs values are misaligned with human values, we donât want it to fight back as we try to correct our mistakes. My project will aim to develop a corrigibility index, capable of mathematically evaluating the extent to which AIs can be expected to cooperate with attempts to modify them.
Bonnie Yang
Undergraduate, majoring in Mathematics and Cognitive Science: Social Learning and AI Alignment
Social learning and cumulative culture are defining features of individual human cognition and human collectives, respectively, that have been largely unaddressed by AI developers focused on narrow task performance. Without mechanisms for social learningâour only guaranteed method of producing human-like cultureâAI alignment strategies will remain brittle and ad-hoc.
Taking a bottom-up approach to agent-agent alignment, my project explores the emergence and enforcement of cultural normsâbehavioral regularities that are stable but arbitrary solutions to repeated coordination problems, after David Lewisâby âembodyingâ deep learning models in complex, affordance-rich virtual environments (e.g., video games). Specific research questions include: What cognitive modules (e.g., expert imitation learning) are needed for social knowledge transmission and role specialization? How do diverse modules/architectures in the population impact collective behavior? What mechanisms underlie norm enforcement vs. emergence?
By the end of this project, I hope to make a (open-source) code pipeline to simulate AI agents working on at least one well-defined cooperative task.
Undergraduate Research
Joseph Benjamin Karaganis
A Sentimental Education⌠For Robots: Incommensurability and the Bounds of Artificial Reason. This paper compares and evaluates two distinct proposals for the âalignmentâ of sophisticated AI models: Peter Railtonâs âethical learningâ approach and Ruth Changâs argument that humans should be âput in the loopâ when AI agents face âhard choicesâ. I suggest that while both of these strategies have their limitations, they each address an important piece of the broader alignment puzzle, and do so in complementary ways. Thus, an alignment approach that combines the twoâand preserves their most important insightsâis likely to be more successful than either pursued individually.
In Defense of Bibliotechnism: Immoderate Interpretationism in the Philosophy of AI. This paper defends âBibliotechnismââthe view that all LLM-produced text is semantically-derivative, and therefore, that LLMs lack intentional attitudes. While some have argued that LLMs can create ânovel references’, I suggest that these are actually derivative with respect to the modelâs original dataset. I further claim that even if LLMs could generate original semantic content, this would not warrant the attribution of intentionality.
Oscar Alexander Lloyd
Normativity in Large Language Models. This project examines how and if normativity transfers in Large Language Models between the input data and output text. It proposes a relational rather than semantic understanding of the ‘reasoning’ performed by these models based on their inability to take into account both sense and reference. This means that text generated by an LLM, which we might ordinarily treat as intuitively motivational, fails to have normative power and we should treat it as such.
Henry Michaelson
From Panopticon to Protocol: Reimagining Social Contract Theory in the Age of Web3. Through this paper, I hope to explore and establish that the conditions under which the social contract theories were formulated in the pre-digital world no longer apply. Large technology companies have systematically attempted to erode and eradicate the moral and political institutionsâfirst and foremost the stateâthat were theorized within the framework of a social contract. Rather than argue that the notion of a social contract no longer makes sense, I contend that the rise of Web3 technology marks a shift not back to the previous status quo, but toward a more practical and reinvigorated social contract, both politically and morally.
PHIL GR9180 Approaches to Applied Ethics: Philosophy of AI
Graduate seminar, Fall 2024. From the course description:
The philosophy of AI is an emerging field. Right now, AI is importantly concerned with LLMs. It is also concerned with the relation between natural and artificial intelligence. Researchers and public discourse ask whether AI can be âalignedâ with values. Accordingly, key questions in Philosophy of AI relate to language, thought, and values.
The seminar starts with the widely debated alignment problem (Part I: weeks 1-4). Independent of how alignment works, it is by no means clear what the desired outcome is. People disagree about values. With which values should AI be aligned? At times the answer is: with âhuman values.â Are âhuman values,â in this context, the different and incompatible sets of values human beings have? If yes, what about the values that we should have?
AI researchers often focus on fairness, typically understood as the elimination of bias (Part II: weeks 5-6). We examine relevant notions of fairness and ask how fairness relates to other values, including and especially accuracy.
Next, we ask whether it makes sense to ascribe beliefs and intentions to AIs (Part III: weeks 7-11). Can AIs engage in reasoning, lie and be held responsible? We discuss âexplainable AI,â asking whether AI-outputs can be understood.
Finally, we examine questions about language as they apply to AIs (Part IV: weeks 12-14). How can LLMs cope with famously tricky components of language and thought, such as generics and implicature?
Part of the seminar are workshop sessions associated with the ValuesLab. Invited guest speakers come from a range of fields. This seminar aims to contribute to dialogue between philosophers and AI developers, and to a shared vocabulary.
COMS W2702 AI in Context
Team-taught, interdisciplinary class, Fall 2024. From the course description:
This team-taught, interdisciplinary class covers the history of AI, the development from Neural Networks (NNs) to Large Language Models (LLMs), philosophy of AI, as well as the role of AI in music and writing. Four sessions are devoted to foundational philosophical questions that bear on AI. Session 1: Can we ascribe beliefs and intentions to AI? Can LLMs speak? Can they lie? Session 2: Can AI be aligned with human values? What is explainable AI (XAI)? Session 3: What makes an AI âfairâ? How does fairness relate to accuracy and other values? Session 4: How should LLMs deal with generics? What about social generics and bias?
Philosophy of AI Reading Group
Organizer: Syan Timothy Lopez
Faculty sponsor: Katja Vogt
From the description:
The Philosophy and AI reading group meets weekly to discuss contemporary papers in artificial intelligence and at the intersection of philosophy and AI. In the Fall 2024, we will be focusing especially on issues of AI and fairness. The reading group is open to current and former graduate students, visiting scholars, lecturers, and faculty. We especially welcome people from other disciplines outside of philosophy, such as computer science, engineering, cognitive science, law, business, etc. If you are interested, please contact Syan Timothy Lopez (sfl2126@columbia.edu).