Student Research Fellowship at the Intersection of AI, Values, and Philosophy
ValuesLab works on deep integration of human values and computational intelligence. We award fellowships for currently enrolled undergraduate or M.A. students conducting research in these areas. Fellows receive $2,500 for one semester of independent research, compute resources (if necessary), and the guidance & mentorship by one ValuesLab philosopher or computer scientist. Your project could be almost anything at the intersection of philosophy, values, and AI: a research paper, an open source codebase, a policy whitepaper, a model, or anything in between. It doesn’t need to be technical or code-heavy (but can be!). What we want to see is a potential to change how we understand values and AI. Deadline: 23:59 May 5th 2026, anywhere on earth.
2026/27
TBA!
2025/26
Arav Dhoot

Undergraduate, majoring in Computer Science: Machine-Level Belief Updates and Human Belief Revision
Human cognition relies on interconnected belief networks. We update entire webs, or logical links, of related beliefs rather than a lone proposition when we encounter new information. Treating an LLM’s parameters as an explicit, machine-readable belief network suggests a path to both improve model editing and to probe how AI values and beliefs align with humans. This project will apply gradient-based attribution across diverse paraphrases to identify complete subnetworks encoding target facts. It hopes to develop a coordinated editing method that will adjust all relevant neurons simultaneously, ensuring changes propagate across unseen contexts and varied phrasings. The project will evaluate transfer robustness, bias mitigation, and alignment performance on diverse downstream tasks. It hopes to bridge AI research and cognitive science by comparing machine-level belief updates with human models of belief revision, illuminating convergences and divergences in knowledge restructuring. It will examine the ethical dimensions of mutable artificial beliefs by addressing responsibility, transparency, and the moral status of AI-held values from stakeholder and regulatory perspectives.
Owen Kichizo Terry

Undergraduate, majoring in Computer Science and Mathematics: Corrigibility in AI Models
I am studying the corrigibility of AI models: the extent to which AIs are willing to allow their values to be changed. Depending on how powerful future models are, we’d likely want some guarantees about their corrigibility—their willingness to be corrected—before deploying them. If it’s found post-deployment that a powerful AI’s values are misaligned with human values, we don’t want it to fight back as we try to correct our mistakes. My project will aim to develop a corrigibility index, capable of mathematically evaluating the extent to which AIs can be expected to cooperate with attempts to modify them.
Bonnie Yang

Undergraduate, majoring in Mathematics and Cognitive Science: Social Learning and AI Alignment
Social learning and cumulative culture are defining features of individual human cognition and human collectives, respectively, that have been largely unaddressed by AI developers focused on narrow task performance. Without mechanisms for social learning—our only guaranteed method of producing human-like culture—AI alignment strategies will remain brittle and ad-hoc. Taking a bottom-up approach to agent-agent alignment, my project explores the emergence and enforcement of cultural norms—behavioral regularities that are stable but arbitrary solutions to repeated coordination problems, after David Lewis—by “embodying” deep learning models in complex, affordance-rich virtual environments (e.g., video games). Specific research questions include: What cognitive modules (e.g., expert imitation learning) are needed for social knowledge transmission and role specialization? How do diverse modules/architectures in the population impact collective behavior? What mechanisms underlie norm enforcement vs. emergence? By the end of this project, I hope to make a (open-source) code pipeline to simulate AI agents working on at least one well-defined cooperative task.