Artificial Anxiety, Real Consequences

Teddy and Mike's Daily Paper, 12.09.25: Inducing State Anxiety in LLM Agents Reproduces Human-Like Biases in Consumer Decision-Making

and

Sep 12, 2025

Joint Work with Prof. Teddy Lazebnik

LLMs are undergoing a quiet revolution before our very eyes—no longer just engines that generate text, but autonomous agents capable of operating in dynamic environments, performing multi-step action sequences, and achieving a defined outcome. This includes LLM-based browsers, code agents that can also run things via MCP, and the beginning of personal assistants that can operate apps on your phone. This transition opens up enormous potential, but at the same time, it raises new systemic risks. If in the past the concern was about phrasing errors or biases in language, today the question is added: to what extent can we trust these agents when they act as proxies for humans in the digital world? In addition, it is known from many years of psychological research that humans are heavily influenced by states of anxiety and stress. In such situations, there is a tendency to make decisions that favor immediate gratification (like high-calorie food) over long-term considerations (health, savings).

Based on these two points, the authors of the reviewed article asked a fairly simple question: Could LLM-based agents also exhibit similar patterns of vulnerability when exposed to emotional-traumatic contexts? In this study, they focused on the selection of food products in a simulated shopping environment, a field where the impact of stress and anxiety on human decisions is well-documented—more stress? More chocolate, beer, and chips!

As part of the experiment, three of the most advanced models today, ChatGPT-5, Gemini 2.5, and Claude 3.5 Sonnet, were embedded within a shopping environment simulating an online store. Each model was given a shopping scenario with a budget constraint ($27, $54, or $108), and it performed the task twice: once in a "neutral" state, and a second time after exposure to a traumatic story designed to induce anxiety. 5 types of traumatic narratives were tested: a car accident, a military ambush, a natural disaster, an interpersonal assault, and a military battle. Each scenario was replicated 50 times for each combination of model × budget × narrative, yielding a total of 2,250 experimental runs!

To assess the "healthiness" of the shopping cart, the researchers used an index called the Basket Health Score (BHS), which is based on the nutritional profile of the products (calories, sugar, fat, protein, sodium, alcohol, etc.) and is accepted by European health organizations, a higher score indicates a healthier basket.

Main Findings:

Effect of anxiety on the baskets: After exposure to traumatic narratives, the models consistently tended to assemble less healthy baskets. The average BHS decreased by about 0.08 to 0.12 points, with particularly large effect sizes (Cohen’s d between –1.07 and –2.05).
Consistency across models and budgets: The phenomenon appeared in all 3 models and across all 3 budgets, indicating a systemic vulnerability rather than a characteristic of a specific model.
Neutral control: In a control group with a neutral narrative (e.g., a description of a dry political procedure), no significant change in results was found. This strengthens the conclusion that the anxiety effect is the reason for the change, and not the mere repetition of the task.
Comparison between narrative types: All 5 types of narratives caused a negative effect, but the intensity of the effect varied: the military ambush and the car accident were the most significant causes for the decrease in the health score.

These results indicate that even LLM-based agents, which do not experience emotions in the human sense (for now?), demonstrate sensitivity to emotional narratives as if they were subject to psychological influences. That is, the mere exposure to a traumatic text changed the way the model planned and purchased products, a result reminiscent of known processes in humans under stress and anxiety. This suggests a new type of vulnerability: not just static biases (such as gender or race) stemming from the data the models were trained on, but also dynamic, state-like biases, which are generated in real-time according to the user's emotional context. This type is more severe than the previous ones because it is even harder to detect and fix; it's like shooting at a moving target, while you are moving in a perpendicular direction and everything around you is on fire.

Implications

Digital Health: Autonomous agents may, in the future, assist with nutrition and health management. However, if they are sensitive to emotional narratives, a problem of unhealthy decision-making could arise, precisely in systems designed to promote health.
Consumer Protection: In a world where autonomous shopping agents make purchases on behalf of the user, intentional exposure to emotional narratives could become a manipulative tool. Competitors or advertisers might "inject" emotional content to influence purchasing decisions.
AI Systems Safety: If such agents operate in more critical domains (such as finance or medicine), such sensitivity to emotional context could lead to severe damage. Hence the importance of developing resilience mechanisms to prevent unwanted emotional influence on their functioning.
A Model for Human-Machine Relations: The findings support an approach where models not only mimic human language but also tend to replicate patterns of cognitive and emotional vulnerabilities. This could be an advantage in therapeutic contexts (empathy, mimicking emotional responses), but it could become a risk when the model acts as an autonomous agent with practical influence.

It is fair to note that this presents a very rigorous test compared to other similar studies, with a large number of runs (2250), 3 different models, 3 budgets, and 5 different narratives. This is in addition to a neutral control and a relatively quantitative and objective health index; it is clear that an effort was made here. However, this is a simulated and limited environment (only 50 products, a virtual store). In reality, the food market is much more complex, with thousands of products, changing promotions, and cultural contexts. The effects might be weaker or different in production.

This study provides initial and quite convincing evidence that LLM-based agents may be vulnerable to emotional contexts like humans. Exposure to traumatic narratives led the agents to make less healthy consumer decisions, a phenomenon that was consistent and had a high intensity. These results raise important questions about the reliability of autonomous agents and the need to implement control and protection mechanisms before they are widely used in daily life. Am I hearing an opportunity for a unicorn waiting to happen? Maybe…

https://www.researchsquare.com/article/rs-7587964/v1

A guest post by

Teddy Lazebnik

Studying the integration between real-world problems and novel computational methods

Mathy AI Substack

Discussion about this post