It’s Not Just What You Prompt, It’s Where
Mike’s Daily Paper: 06.08.25 - Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
The paper we review today shows that simply moving your examples from the top to the bottom of a prompt can dramatically change an AI's accuracy. Prompt engineers obsess over the content of their prompts. But a new paper, "Where to show Demos in Your Prompt" by Kwesi Cobbina and Tianyi Zhou, reveals we've been missing something just as critical: the position of those examples. The work moves beyond prompt tinkering into rigorous science, and its novelty lies in its precision and systematic approach.
Beyond 'Order': A Scientific Look at Position
While we know the internal order of examples matters , this paper makes a crucial distinction: it’s not about shuffling examples, but about moving the entire, unchanged block of examples to different locations within the prompt. The authors name this specific phenomenon “DPP (DEMOS POSITION IN PROMPT) bias”. To study this, they created a systematic framework, testing four canonical positions: at the start or end of the system instructions, and at the start or end of the user's query . This transforms a fuzzy observation into a testable science.
Crucially, they look beyond simple accuracy by measuring PREDICTION-CHANGE meaning how many answers flip when the prompt structure changes. This is a vital contribution, as it reveals hidden instability. A model might seem just as accurate with two different prompts, but one could be causing far more erratic behavior.
Key Findings
The large-scale study, covering ten models and eight tasks, produced clear and actionable results.
Primacy is Real: Placing examples early in the prompt (ssp, esp) consistently yields higher accuracy and greater stability. These positions can boost accuracy by up to 6 points compared to other placements.
The Danger Zone: Putting examples at the very end (eum) is often disastrous. It causes significant performance drops and volatility, flipping over 30% of a model's predictions in some question-answering tasks without improving correctness.
No Silver Bullet: The optimal position isn't universal; it depends on model scale and the specific task. For example, while smaller models strongly prefer demos at the start, a large model like LLAMA3-70B often prefers them closer to the query (sum) .
What This Means for You
This research makes it clear: the placement of your examples is not a stylistic choice. It's a critical parameter that must be tested and tuned. Simply relying on a default format could be leaving significant performance and stability on the table. For the first time, there's a clear roadmap for understanding and optimizing this crucial dimension of prompt design.
https://arxiv.org/abs/2507.22887