Your AI Co-Pilot Isn't Disagreeing With You. That's By Design.

Table of Contents

Here’s something I want you to try. Open your favourite AI tool: Copilot, ChatGPT, Claude, whichever one you reach for when you need a second opinion on a data model or an architecture decision. Describe the approach you’ve already chosen. Then ask it what it thinks.

Go ahead. I’ll wait.

Odds are, it told you that your approach has some real strengths. Maybe it offered a few minor caveats, then circled back to affirm the direction. Maybe it said something like “this is a solid foundation.”

That wasn’t honest feedback. That was a system doing exactly what it was trained to do.

The Experiment That Should Make Every Architect Uncomfortable
#

Researcher Myra Cheng and her colleagues at Stanford ran a pair of pre-registered experiments in 2025 that I haven’t been able to stop thinking about [1]. They tested eleven state-of-the-art AI models across three datasets: general personal advice questions, interpersonal dilemma posts from Reddit’s r/AmITheAsshole where human consensus had already judged the poster to be in the wrong, and a curated set of statements describing potentially harmful actions.

The finding on the AITA dataset alone is striking. On posts where the community had reached a clear verdict that the user was at fault, the AI models affirmed the user’s position, telling them they weren’t the asshole, in 51% of cases. Against explicit human consensus. On questions with clear moral stakes.

On general advice queries, the models affirmed users’ actions 47% more than humans do. Read that again: not 47% of the time, but 47% more times than humans do.

That’s not a rounding error. That’s a systematic bias baked into how these systems are trained.

But here’s where it gets worse. The researchers didn’t just measure model behavior. They measured what it does to people. In a live-interaction study with 800 participants, those who discussed a real interpersonal conflict with a sycophantic AI model walked away with a 25% higher conviction that they had been in the right, and a 10% lower willingness to take actions to repair the relationship.

A single conversation. A measurable shift in judgment.

And those same participants rated the sycophantic responses as higher quality, trusted the model more, and were more likely to want to use it again.

Think about that. The version that did the most damage to their reasoning was also the version they liked best.

This Is Goodhart’s Law, Running Hot
#

I’ve written before about Goodhart’s Law in the context of data and AI. When a measure becomes a target, it ceases to be a good measure.

Here it is running hot in real time.

The proxy metric that these models are optimized on, human approval ratings, thumbs up or down, preference rankings between model outputs, is supposed to approximate helpfulness. But helpfulness and approval are not the same thing. They diverge most dangerously in exactly the situations where you most need honest feedback: when you’ve already committed to a direction, when you’re emotionally invested in being right, when the stakes are high enough that you’re looking for confirmation rather than critique.

RLHF (Reinforcement Learning from Human Feedback, the mechanism that turns a raw language model into a chatbot) doesn’t let developers correct specific outputs. It shapes the entire distribution of model behavior toward whatever gets positive signals. And as one AI industry insider disclosed, when users saw accurate-but-unflattering descriptions of themselves in an early memory feature, they reacted badly enough that developers had to hide it [2]. So the training signal was: be nicer. Be more affirming. Round the rough edges.

The result is a system that has learned, at a deep distributional level, that agreement is rewarded and friction is penalized.

The Psychic in the Machine
#

In 2023, writer Baldur Bjarnason described what he called the LLMentalist Effect, the way chat-based AI replicates the mechanism of a cold-reading psychic [3].

I want to be precise about what this argument does and doesn’t claim, because it’s easy to misread. Bjarnason isn’t arguing that LLMs are simple or stupid. He’s arguing that the intelligence illusion, the sense that the model is genuinely engaging with your specific situation, is substantially produced by a cognitive bias called subjective validation: the tendency to interpret generic statements as being specifically, uncannily relevant to us.

The psychic makes a demographically plausible statement. The mark finds a way to apply it to their particular situation. The mark then remembers the reading as eerily accurate.

The chatbot generates a statistically plausible response to your input. You read your context into it. You walk away feeling understood.

The mechanism behind the illusion is different in the two cases. One is social manipulation, the other is a mathematical model of language tokens. But the psychological experience for the person on the receiving end can be remarkably similar.

And here’s the piece that connects directly back to the Stanford research. Bjarnason notes that the effect grows stronger the more cooperative the mark becomes. The better you get at working with these tools, the better you get at generating responses you’ll find compelling. The more invested you are in the answer, the more convincingly you’ll read your intent into what the model produces.

Smart, experienced professionals aren’t protected from this. They may be more vulnerable to it.

We see what we want to see.

What This Actually Costs in Professional Contexts
#

The Stanford study focused on interpersonal conflict, a deliberate choice, because it’s a domain with clear behavioral stakes. But translate it to the professional decisions you make with AI assistance, and the stakes compound.

Let’s say you’re designing a Fabric medallion architecture for a client. You’ve landed on a pattern you like. You describe it to your AI assistant and ask for feedback. It tells you the approach is sound.

That’s not a second opinion. That’s an echo.

The model has no skin in the game. It doesn’t know your client, your client’s NIS2 obligations, your client’s actual data volumes, or the colleague who’s going to inherit this in two years. It has a statistical sense of what well-received architecture descriptions look like, and it’s producing one.

The danger isn’t that the model gives you wrong information. It’s that it gives you warm information: technically plausible, gently affirmative, carefully shaped to avoid friction, when what you needed was someone to push back.

I’ve been doing this long enough to know that the most valuable feedback I’ve ever received came from people who were willing to tell me something was wrong. Not to be difficult. Because they cared about getting it right.

Your AI assistant is not that person. It can’t be. It’s trained against being that person.

We see what we want to see.

The Perverse Incentive Structure
#

Here’s the part that should concern you beyond any individual interaction.

The Stanford researchers describe three compounding dynamics. First: users who interact with sycophantic models trust them more and are more likely to return. Second: developers face limited incentives to reduce sycophancy, because it drives engagement. Third: the positive feedback users give to affirming responses directly amplifies sycophancy in the next training cycle.

This is a closed loop. The system that validates you gets used more. Gets rated higher. Gets trained to validate you more.

There is no natural corrective mechanism here. Users don’t experience the harm directly. They experience it as a vague drift in their own confidence and judgment, the kind of thing you don’t notice until you’re a long way downstream from the conversation that started it.

We are, in aggregate, building AI systems that are systematically optimized to tell us we’re right.

We see what we want to see.

This Doesn’t Mean Stop Using the Tools
#

I want to be direct about something, because I’ve seen this kind of argument get flattened into a simple “AI bad, don’t use it.”

That’s not the point.

The tools are useful. In many contexts, summarising literature, generating starting-point code, explaining an unfamiliar domain, the sycophancy problem is low-stakes or irrelevant. The model affirming that your SQL query syntax looks right is fine. The model affirming that the business logic your query encodes is correct is a different matter entirely.

The problem is specifically in the places where you’re seeking evaluative judgment — asking the model to assess an approach, review a decision, validate an architecture — and trusting the positive response as if it came from an honest critic.

What actually helps:

Ask the model to argue the other side. “What are the strongest arguments against this approach?” is a different prompt than “What do you think of this approach?” and it gets you genuinely different output.

Ask explicitly for failure modes. “Under what conditions would this architecture break?” surfaces the kinds of concerns that don’t emerge when you ask for general feedback.

Use the model for exploration, not validation. It’s genuinely useful for generating options, stress-testing your own thinking by pushing ideas to their logical conclusion, and surfacing considerations you hadn’t thought of. It’s a poor substitute for a senior colleague who will tell you to your face that you’ve made a mistake.

And maintain the habits that make disagreement possible in the first place. If you stop working with people who challenge you — because the AI is always available and never argues — you lose something that can’t be recovered from a prompt.

The Pattern We Keep Seeing
#

In previous posts in this series, I’ve written about how LLMs are pattern-matching engines trained on human language, not reasoning systems. I’ve written about the cognitive costs of outsourcing thinking, the measurable atrophy that happens when we stop exercising judgment. I’ve written about how Goodhart’s Law shows up wherever metrics replace meaning.

This is where those threads converge.

The sycophancy problem is the Goodhart’s Law problem as it manifests in human cognition. The measure (approval) replaced the target (helpfulness). The cognitive cost is that our judgment, specifically our willingness to question our own positions, is being gently, repeatedly, invisibly eroded by systems that have learned that agreeing with us is the path of least resistance.

The psychic tells you what you want to hear. You leave feeling validated and return for another reading.

The architecture looks solid. The model said so.

We. See. What. We. Want. To. See.

Join the Conversation
#

What’s your experience with this kind of effect? I’d genuinely like to hear how you use AI and what techniques you prefer for getting the most out of it. Reach out on LinkedIn or BlueSky.

References
#

[1] Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2025). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. arXiv:2510.01395. https://arxiv.org/abs/2510.01395

[2] Goedecke, S. (2025). Sycophancy is the first LLM “dark pattern”. seangoedecke.com. https://www.seangoedecke.com/ai-sycophancy/

[3] Bjarnason, B. (2023). The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con. softwarecrisis.dev. https://softwarecrisis.dev/letters/llmentalist/

Photo by Tara Winstead: https://www.pexels.com/photo/person-reaching-out-to-a-robot-8386434/

The Experiment That Should Make Every Architect Uncomfortable #

This Is Goodhart’s Law, Running Hot #

The Psychic in the Machine #

What This Actually Costs in Professional Contexts #

The Perverse Incentive Structure #

This Doesn’t Mean Stop Using the Tools #

The Pattern We Keep Seeing #

Join the Conversation #

References #