253 research outputs found
Concepts in a Probabilistic Language of Thought
Note: The book chapter is reprinted courtesy of The MIT Press, from the forthcoming edited collection “The Conceptual Mind: New Directions in the Study of Concepts” edited by Eric Margolis and Stephen Laurence, print date Spring 2015.Knowledge organizes our understanding of the world, determining what we expect given what we have already seen. Our predictive representations have two key properties: they are productive, and they are graded. Productive generalization is possible because our knowledge decomposes into concepts—elements of knowledge that are combined and recombined to describe particular situations. Gradedness is the observable effect of accounting for uncertainty—our knowledge encodes degrees of belief that lead to graded probabilistic predictions. To put this a different way, concepts form a combinatorial system that enables description of many different situations; each such situation specifies a distribution over what we expect to see in the world, given what we have seen. We may think of this system as a probabilistic language of thought (PLoT) in which representations are built from language-like composition of concepts and the content of those representations is a probability distribution on world states. The purpose of this chapter is to formalize these ideas in computational terms, to illustrate key properties of the PLoT approach with a concrete example, and to draw connections with other views of conceptual structure.This work was supported by ONR awards N00014-09-1-0124 and N00014-13-
1-0788, by a John S. McDonnell Foundation Scholar Award, and by the Center
for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216
The Search for Invariance: Repeated Positive Testing Serves the Goals of Causal Learning
Positive testing is characteristic of exploratory behavior, yet it seems to be at odds with the aim of information seeking. After all, repeated demonstrations of one’s current hypothesis often produce the same evidence and fail to distinguish it from potential alternatives. Research on the development of scientific reasoning and adult rule learning have both documented and attempted to explain this behavior. The current chapter reviews this prior work and introduces a novel theoretical account—the Search for Invariance (SI) hypothesis—which suggests that producing multiple positive examples serves the goals of causal learning. This hypothesis draws on the interventionist framework of causal reasoning, which suggests that causal learners are concerned with the invariance of candidate hypotheses. In a probabilistic and interdependent causal world, our primary goal is to determine whether, and in what contexts, our causal hypotheses provide accurate foundations for inference and intervention—not to disconfirm their alternatives. By recognizing the central role of invariance in causal learning, the phenomenon of positive testing may be reinterpreted as a rational information-seeking strategy
Understanding Social Reasoning in Language Models with Language Models
As Large Language Models (LLMs) become increasingly integrated into our
everyday lives, understanding their ability to comprehend human mental states
becomes critical for ensuring effective interactions. However, despite the
recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of
LLMs, the degree to which these models can align with human ToM remains a
nuanced topic of exploration. This is primarily due to two distinct challenges:
(1) the presence of inconsistent results from previous evaluations, and (2)
concerns surrounding the validity of existing evaluation methodologies. To
address these challenges, we present a novel framework for procedurally
generating evaluations with LLMs by populating causal templates. Using our
framework, we create a new social reasoning benchmark (BigToM) for LLMs which
consists of 25 controls and 5,000 model-written evaluations. We find that human
participants rate the quality of our benchmark higher than previous
crowd-sourced evaluations and comparable to expert-written evaluations. Using
BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and
compare model performances with human performance. Our results suggest that
GPT4 has ToM capabilities that mirror human inference patterns, though less
reliable, while other LLMs struggle
Causal Responsibility and Robust Causation
How do people judge the degree of causal responsibility that an agent has for the outcomes of her actions? We show that a relatively unexplored factor – the robustness (or stability) of the causal chain linking the agent’s action and the outcome – influences judgments of causal responsibility of the agent. In three experiments, we vary robustness by manipulating the number of background circumstances under which the action causes the effect, and find that causal responsibility judgments increase with robustness. In the first experiment, the robustness manipulation also raises the probability of the effect given the action. Experiments 2 and 3 control for probability-raising, and show that robustness still affects judgments of causal responsibility. In particular, Experiment 3 introduces an Ellsberg type of scenario to manipulate robustness, while keeping the conditional probability and the skill deployed in the action fixed. Experiment 4, replicates the results of Experiment 3, while contrasting between judgments of causal strength and of causal responsibility. The results show that in all cases, the perceived degree of responsibility (but not of causal strength) increases with the robustness of the action-outcome causal chain
What's fair? How children assign reward to members of teams with differing causal structures
How do children reward individual members of a team that has just won or lost a game? We know that from pre-school age, children consider agents’ performance when allocating reward. Here we assess whether children can go further and appreciate performance in context: The same pattern of performance can contribute to a team outcome in different ways, depending on the underlying rule framework. Two experiments, with three age groups (4/5-year-olds, 6/7-year-olds, and adults), varied performance of team members, with the same performance patterns considered under three different game rules for winning or losing. These three rules created distinct underlying causal structures (additive, conjunctive, disjunctive), for how individual performance affected the overall team outcome. Even the youngest children differentiated between different game rules in their reward allocations. Rather than only rewarding individual performance, or whether the team won/lost, children were sensitive to the team structure and how players’ performance contributed to the win/loss under each of the three game rules. Not only do young children consider it fair to allocate resources based on merit, but they are also sensitive to the causal structure of the situation which dictates how individual contributions combine to determine the team outcome
Recommended from our members
Language models assign responsibility based on actual rather than counterfactual contributions
How do language models assign responsibility and reward, and is it similar to how humans do it? We instructed three state-of-the-art large language models to assign responsibility (Experiment 1) and reward (Experiment 2) to agents in a collaborative task. We then compared the language models' responses to seven existing cognitive models of responsibility and reward allocation. We found that language models mostly evaluated agents based on force (how much they actually did), in line with classical production-style accounts of causation. By contrast, humans valued actual and counterfactual effort (how much agents tried or could have tried). These results indicate a potential barrier to effective human-machine collaboration
Systematizing Policy Learning: From Monolith to Dimensions
notes: The authors wish to express their gratitude to the Norwegian Political Science Association Annual Conference, 6 January 2010, University of Agder, Kristiansand, participants of the ‘Establishing Causality in Policy Learning’ panel at the American Political Science Association (APSA) annual meeting,2–5 September 2010,Washington DC, and the European Consortium of Political Research (ECPR) Joint Sessions, St Gallen, 12–17 April 2011, workshop 2. Dunlop and Radaelli gratefully acknowledge the support of the European Research Council, grant on Analysis of Learning in Regulatory Governance, ALREG, http://centres.exeter.ac.uk/ceg/research/ALREG/index.php.publication-status: AcceptedThe definitive version is available at www.blackwell-synergy.com and also from DOI: 10.1111/j.1467-9248.2012.00982.xThe field of policy learning is characterised by concept stretching and lack of systematic findings. To systematize them, we combine the classic Sartorian approach to classification with the more recent insights on explanatory typologies. At the outset, we classify per genus et differentiam – distinguishing between the genus and the different species within it. By drawing on the technique of explanatory typologies to introduce a basic model of policy learning, we identify four major genera in the literature. We then generate variation within each cell by using rigorous concepts drawn from adult education research. Specifically, we conceptualize learning as control over the contents and goals of knowledge. By looking at learning through the lenses of knowledge utilization, we show that the basic model can be expanded to reveal sixteen different species. These types are all conceptually possible, but are not all empirically established in the literature. Up until now the scope conditions and connections among types have not been clarified. Our reconstruction of the field sheds light on mechanisms and relations associated with alternatives operationalizations of learning and the role of actors in the process of knowledge construction and utilization. By providing a comprehensive typology, we mitigate concept stretching problems and aim to lay the foundations for the systematic comparison across and within cases of policy learning.European Research Council, grant no 230267 on Analysis of Learning in Regulatory Governance, ALREG
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
When prompting a language model (LM), users often expect the model to adhere
to a set of behavioral principles across diverse tasks, such as producing
insightful content while avoiding harmful or biased language. Instilling such
principles (i.e., a constitution) into a model is resource-intensive,
technically challenging, and generally requires human preference labels or
examples. We introduce SAMI, an iterative algorithm that finetunes a pretrained
language model (without requiring preference labels or demonstrations) to
increase the conditional mutual information between constitutions and
self-generated responses given queries from a dataset. On single-turn dialogue
and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained
model, with win rates between 66% and 77%. Strikingly, it also surpasses an
instruction-finetuned baseline (mistral-7b-instruct) with win rates between 55%
and 57% on single-turn dialogue. SAMI requires a model that writes the
principles. To avoid dependence on strong models for writing principles, we
align a strong pretrained model (mixtral-8x7b) using constitutions written by a
weak instruction-finetuned model (mistral-7b-instruct), achieving a 65% win
rate on summarization. Finally, we investigate whether SAMI generalizes to
diverse summarization principles (e.g., "summaries should be scientific") and
scales to stronger models (llama3-70b), finding that it achieves win rates of
up to 68% for learned and 67% for held-out principles compared to the base
model. Our results show that a pretrained LM can learn to follow constitutions
without using preference labels, demonstrations, or human oversight
Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models
As AI systems like language models are increasingly integrated into
decision-making processes affecting people's lives, it's critical to ensure
that these systems have sound moral reasoning. To test whether they do, we need
to develop systematic evaluations. We provide a framework that uses a language
model to translate causal graphs that capture key aspects of moral dilemmas
into prompt templates. With this framework, we procedurally generated a large
and diverse set of moral dilemmas -- the OffTheRails benchmark -- consisting of
50 scenarios and 400 unique test items. We collected moral permissibility and
intention judgments from human participants for a subset of our items and
compared these judgments to those from two language models (GPT-4 and Claude-2)
across eight conditions. We find that moral dilemmas in which the harm is a
necessary means (as compared to a side effect) resulted in lower permissibility
and higher intention ratings for both participants and language models. The
same pattern was observed for evitable versus inevitable harmful outcomes.
However, there was no clear effect of whether the harm resulted from an agent's
action versus from having omitted to act. We discuss limitations of our prompt
generation pipeline and opportunities for improving scenarios to increase the
strength of experimental effects.Comment: CogSci 202
Social Contract AI: Aligning AI Assistants with Implicit Group Norms
We explore the idea of aligning an AI assistant by inverting a model of
users' (unknown) preferences from observed interactions. To validate our
proposal, we run proof-of-concept simulations in the economic ultimatum game,
formalizing user preferences as policies that guide the actions of simulated
players. We find that the AI assistant accurately aligns its behavior to match
standard policies from the economic literature (e.g., selfish, altruistic).
However, the assistant's learned policies lack robustness and exhibit limited
generalization in an out-of-distribution setting when confronted with a
currency (e.g., grams of medicine) that was not included in the assistant's
training distribution. Additionally, we find that when there is inconsistency
in the relationship between language use and an unknown policy (e.g., an
altruistic policy combined with rude language), the assistant's learning of the
policy is slowed. Overall, our preliminary results suggest that developing
simulation frameworks in which AI assistants need to infer preferences from
diverse users can provide a valuable approach for studying practical alignment
questions.Comment: SoLaR NeurIPS 2023 Workshop (https://solar-neurips.github.io/
- …
