Projects
Philosophy and AI
In recent years, any number of universities have built bridges between computer science and ethics. Students whose future careers involve designing AIs, the thought is, should learn some basic ideas in ethics. Research done at the ValuesLab questions this widely held view. In a slogan, AI needs philosophy, not “only” ethics. Why? Ethics certainly has a role to play in the training of future AI designers. Many of the key questions, however, are studied elsewhere. Here are some examples. Can AIs think? Does it make sense to speak as if AIs held beliefs, make inferences, engage in reasoning, have intentions, and so on? The relevant notions here are studied in the philosophy of mind. What makes AIs explainable? Understanding, explanation, and justification are topics in epistemology. How do LLMs work, and how can their performance be improved? This involves any number of themes from the philosophy of language. What makes an AI model unfair? Here it matters how we conceive of causation and counterfactuals, key notions in metaphysics and the philosophy of science. The list goes on. The upshot is that core questions in major subfields of philosophy bear directly on challenges in AI. Instead of “embedded ethicists,” we argue, AI courses need embedded philosophers.
Explainable AI and Ancient Values
Contrary to modern moral philosophy, ancient Greek ethics argues that values related to knowledge and understanding—in Greek, epistêmê—are fundamental to human life. Our project revives this conviction and argues that it speaks to key concerns in AI. AIs decide or help decide who gets an interview for a job, whose loan application is approved, what a patient’s medical diagnosis is and what their best treatment options are, and so forth. Much of the research in the ethics of AI is about the fairness of these decisions. We argue that research should also address epistemic values such as understanding and explainability. We aim to contribute to so-called explainable AI (XAI), which appreciates that it is a basic feature of the human mind to ask and expect answers to why-questions.
Alignment and Confucian Ethics
This project explores the resources of Confucian ethics for AI development, specifically with a view to value-alignment. It is widely assumed that ethics should inform AI design. Which ethics are we talking about? Researchers often invoke consequentialism, Kantian ethics, and Aristotle. Thus AI ethics seems to find itself squarely in the Western tradition. By contrast, we ask how Confucian philosophy may contribute to the development of AIs. Confucian ethics offers a framework that attends to (i) roles and relationships, (ii) constraints on what is sayable/doable for a polite person, and (iii) specified situations. For example, a son would not say such-and-such to a father at such-and-such an occasion. It is our hypothesis that this structure of ethical theory is helpful for AI ethics. For example, LLMs need to be trained on what not to say. Some of what counts as unsayable, offensive, and so on, is addressee- and context-dependent. We also consider AI models that are used in medicine, education, and so on. In these contexts, people speak as doctor to patients, as teacher to student, and so on. Presumably, AIs used in such domains should be informed by roles and relationships, conventions of the sayable/doable, and situations. In other words, Confucian ethics may provide a structure that can be modeled in AIs.
Videos
Philosophy and AI: An Introduction
Alignment
Lecture Slides
Do AIs have beliefs? Do they have intentions?
AI and Value Alignment
AI and Fairness
Papers
“Measure Realism”
in progress, co-authored with Jens Haas
“Do LLMs have Beliefs? It Depends”
in progress
“Generics and Inference”
in progress
Questions
Agency
— Are current models built as if the AI pursues its ends? If yes, does this speak for a radical shift, toward models oriented toward human final ends?
— If AIs can have values or posited ends, should there be a built-in, strict dominance of human values to AI values?
— Should AIs try to emulate that human decision-making is fundamentally concerned with sustaining human life and guided by what agents take to be well-lived human lives?
— Who is responsible for decisions that the AI makes? Is the AI itself responsible, or are its creators responsible for the AI’s decisions?
— Is AI as a technology “value neutral,” simply reflective of the values of its creators and the data it is trained on? Alternatively, is it imbued with a bent toward, or away from, particular values?
Ethical Values
— Are there values, for example, related to the survival of humankind, that neither human beings nor intelligent machines should override?
— What notion(s) of fairness do AI researchers employ?
— When researchers describe AIs as fair or just, do they invoke agential or systemic notions? In other words, should we think of AIs as agents in the world, who can have virtues such as justice or fairness? Should we think of them as components of the social environments that constrain human action?
— Can the special weight of moral considerations in human reasoning be simulated by AIs?
— Ethics is concerned with how human beings should live. Does this mean that AIs should ask “what should a human agent do?” (as opposed to “what should one do?”).
— Would it be appropriate for human beings to defer ethical decision-making to AIs? Does this manner of decision-making threaten human autonomy?
Truth and Other Epistemic Values
— What is the role of epistemic norms, for example, norms that request attention to evidence, careful thinking, etc., in AIs?
— Is sensitivity to value an additional, separable dimension of AIs, to be added to existing systems? Alternatively, are “ethical abilities” integrated dimensions of the “thinking abilities” of AIs, such that they improve along with them?
— What is an LLM’s relationship to the truth or falsity of its outputs? What does it mean for an LLM to “tell the truth,” “lie,” “hallucinate,” etc.? Can AIs have virtues such as honesty?
— How can LLMs distinguish between domains where responses to prompts should draw on expertise, and domains such as ethics where it is not immediately obvious what constitutes expertise? In the former, experts tend to agree; in the latter, even experts disagree.
— How do we assess what constitutes good thinking in human beings? Should AIs emulate excellent human thinking, or do they come with their own standards of excellence?
Credences and Risk
— Suppose that AIs should be designed such that they assign probabilistically coherent credences to propositions or to surrogates for propositions. What ought to be the credence thresholds for action or belief reports?
— What should guide AI designers with regard to credence thresholds? Risk aversion?
— Should AIs be designed to primarily avoid assigning high credence to falsehoods, or should it primarily aim to assign high credences to truths?
— Can we afford to design AIs to make mistakes from time to time, or should all high credence assignments result from extremely good epistemic positions?
Mental States
— We don’t know whether AIs will ever have mental states such as intentions and beliefs. Lying arguably involves both: an intention to deceive and saying something one believes to be false. Is it a mere metaphor to describe AI-outputs in terms of truth-telling versus lying?
— Should we, instead of asking whether AIs can have beliefs and intentions, ask whether we must come up with broader notions of belief and intention, which can encompass human states and AI states?
— An AI-system may be said to have “information.” Does it “believe” the things it can provide as information? Does it “know” them? What are the functional analogues to mental states such as belief and knowledge?
— How do questions of interpretability bear on ethics? For example, are ethical questions ones where it is especially important to not only have the answer, but also to understand how the answer was generated?
— Should AI models try to emulate the roles of emotion and desiderative/aversive attitudes in human decision-making? If yes, how?
Flawed Thinking
— Should value integration start from the premise that intelligent machines ought to help us become better thinkers? How does this relate to the frequently-asked question of whether intelligent machines are, or will soon become, better thinkers than we are?
— Via the corpora of text, images, etc., that LLMs ingest, they inherit flaws of human thinking, e.g., jumping to conclusions, fallacies. Which dimensions of human reasoning do we want to reproduce? Which dimensions of human reasoning can be improved with the help of machines?
— In human beings, informal fallacies are often treated as “shortcuts” or “fast track” thinking, saving time and mental energy in resource limited environments. Are we aiming for AIs without any such “shortcuts”? If yes, this would constitute a major difference between human and machine reasoning.
— Does an AI possess “inherent knowledge,” prior to ingesting data? Does the algorithm itself qualify as such?
Pluralism and Disagreement
— How can computational intelligence recognize value pluralism, disagreement, and historical changes in evaluative outlooks?
— Is there a role for the use of different outputs from different LLMs? Do LLMs that are tailored to particular viewpoints generate “echo chamber” problems?
— How can AIs recognize that human evaluative outlooks tend to be fragmented and inconsistent? Should they seek to correct these inconsistencies?
— What is the effect of adding an extensive curriculum in ethics, including works that defend a range of approaches, to the training of an AI model?