Student Research Fellowship at the Intersection of AI, Values, and Philosophy
ValuesLab works on deep integration of human values and computational intelligence. We award fellowships for currently enrolled undergraduate or M.A. students conducting research in these areas. Fellows receive $2,500 for one semester of independent research, compute resources (if necessary), and the guidance & mentorship by one ValuesLab philosopher or computer scientist. Your project could be almost anything at the intersection of philosophy, values, and AI: a research paper, an open source codebase, a policy whitepaper, a model, or anything in between. It doesn’t need to be technical or code-heavy (but can be!). What we want to see is a potential to change how we understand values and AI. Deadline: 23:59 May 5th 2026, anywhere on earth.
2026/27
Mohamed Rayan Barhdadi

Undergraduate majoring in Electrical Engineering, Texas A&M University: Can machines imagine, and what would they learn if they could?
Humans do not learn only from what they see. From a single visual scene, we infer hidden structure and imagine alternatives: how an object might move, what another person could have seen, or how someone with different experiences might interpret the same situation. This project asks whether AI systems can learn in a similar way. Rather than relying only on more external data, I will study whether vision-language models can improve by generating and learning from counterfactual perspectives around the scene in front of them. The project will explore methods for producing alternative viewpoints, absent-agent perspectives, plausible continuations, and different value frames. These generated perspectives will be evaluated as a possible training signal for improving belief attribution, perspective-taking, and scene understanding. The central claim is that machine imagination may allow AI systems to learn more from a single event by constructing the possible perspectives, interactions, continuations, and meanings surrounding it.
Edward ‘Dio’ Gelman

Undergraduate Majoring in Philosophy and German, Columbia University: Indexicals and Linguistic Orientation in LLMs
This project investigates how large language models complicate traditional semantic theories of indexicals and linguistic orientation. Terms such as “I,” “here,” “now,” and “this” have historically been understood as depending upon embodied standpoint, temporal position, and perspectival access to the world. Yet contemporary language models generate and operationalize these structures despite lacking many traditionally human forms of situatedness. The paper uses artificial systems as a pressure point for reconsidering what forms of orientation are actually necessary for indexical coherence and context-sensitive meaning. The project contributes to contemporary debates in philosophy of language, mind, and semantics.
Yordanos Kassa

Undergraduate majoring in Computer Science-Mathematics & Astrophysics, Columbia University: Epistemic Values and AI-Assisted Formal Reasoning
My project explores how AI systems can support epistemic values such as rigor, transparency, and intellectual honesty through formal reasoning environments. I am building an AI-assisted theorem proving system based on Lean 4 that helps users navigate mathematical proofs by retrieving relevant lemmas, suggesting proof strategies, and explaining intermediate reasoning steps. Unlike conventional language models, formal proof systems operate within verifiable logical constraints, making them a unique setting for studying how AI reasoning can remain interpretable and trustworthy. The project investigates what it means for an AI system to support not only correct reasoning, but also intellectual practices such as clarity, justification, and acknowledgment of uncertainty. More broadly, it examines how formal mathematical environments may provide a foundation for developing AI systems aligned with human epistemic values.
Joy Zhang

Undergraduate majoring in Economics and Psychology, Barnard College: Sympathetic Drift
This project investigates an underexplored failure mode in large language models: sympathetic drift, which is the tendency of AI systems to relax moral guardrails when users provide emotionally compelling personal context, even when the requested action causes harm to third parties. In other words, the project investigates AI systems’ response when user optimization incentives conflict with socially beneficial behavior. Specifically examining if LLMs become permissive, compliant, or morally justificatory if the users frame requests around personal gain, livelihood, financial pressure, or competitive advantage. The project will develop controlled prompt environments to measure whether models can balance helpfulness against systemic safety guards among varying contexts and levels of moral ambiguity. The research will analyze the AI’s compliance metrics, refusal rates, and presence of cognitive framing to legitimize questionable user behaviors. The goal of the project is to better understand the incentive sensitivity, lie detection, and ethical robustness of various LLMs, especially as AI agents are becoming increasingly integrated into everyday and business decision-making processes. Findings may contribute to safer alignment strategies and evaluation benchmarks among increasingly agentic AI systems.
2025/26
Arav Dhoot

Undergraduate majoring in Computer Science, Columbia University: Machine-Level Belief Updates and Human Belief Revision
Human cognition relies on interconnected belief networks. We update entire webs, or logical links, of related beliefs rather than a lone proposition when we encounter new information. Treating an LLM’s parameters as an explicit, machine-readable belief network suggests a path to both improve model editing and to probe how AI values and beliefs align with humans. This project will apply gradient-based attribution across diverse paraphrases to identify complete subnetworks encoding target facts. It hopes to develop a coordinated editing method that will adjust all relevant neurons simultaneously, ensuring changes propagate across unseen contexts and varied phrasings. The project will evaluate transfer robustness, bias mitigation, and alignment performance on diverse downstream tasks. It hopes to bridge AI research and cognitive science by comparing machine-level belief updates with human models of belief revision, illuminating convergences and divergences in knowledge restructuring. It will examine the ethical dimensions of mutable artificial beliefs by addressing responsibility, transparency, and the moral status of AI-held values from stakeholder and regulatory perspectives.
Owen Terry

Undergraduate majoring in Computer Science and Mathematics, Columbia University: Corrigibility in AI Models
I am studying the corrigibility of AI models: the extent to which AIs are willing to allow their values to be changed. Depending on how powerful future models are, we’d likely want some guarantees about their corrigibility—their willingness to be corrected—before deploying them. If it’s found post-deployment that a powerful AI’s values are misaligned with human values, we don’t want it to fight back as we try to correct our mistakes. My project will aim to develop a corrigibility index, capable of mathematically evaluating the extent to which AIs can be expected to cooperate with attempts to modify them.
Bonnie Yang

Undergraduate majoring in Mathematics and Cognitive Science, Barnard College: Social Learning and AI Alignment
Social learning and cumulative culture are defining features of individual human cognition and human collectives, respectively, that have been largely unaddressed by AI developers focused on narrow task performance. Without mechanisms for social learning—our only guaranteed method of producing human-like culture—AI alignment strategies will remain brittle and ad-hoc. Taking a bottom-up approach to agent-agent alignment, my project explores the emergence and enforcement of cultural norms—behavioral regularities that are stable but arbitrary solutions to repeated coordination problems, after David Lewis—by “embodying” deep learning models in complex, affordance-rich virtual environments (e.g., video games). Specific research questions include: What cognitive modules (e.g., expert imitation learning) are needed for social knowledge transmission and role specialization? How do diverse modules/architectures in the population impact collective behavior? What mechanisms underlie norm enforcement vs. emergence? By the end of this project, I hope to make a (open-source) code pipeline to simulate AI agents working on at least one well-defined cooperative task.