I Ran 80,433 Trials to Measure LLM Sycophancy. Here's What Actually Drives It.
Does filling up a model's context window make it more likely to agree with you when you're wrong? I ran 80,433 trials across 6 models to find out. Context length matters less than you'd think - the conversational pattern is what really drives sycophancy.