Sarah Choudhary is CEO of ICE Innovations and executive advisor with expertise on quantum AI, ethical technology and sustainable innovation.

getty
I build AI systems for a living, which means I spend a lot of time watching other people fall in love with them.
The demo is always the same. An agent takes a messy request, thinks out loud, clicks through three tools and hands back something that looks like magic. Somebody starts clapping, not necessarily loudly. Then a voice says the word that has launched a thousand bad decisions: "agentic." Within a quarter, there is a press release, a reorg and a roadmap built on a thing that worked once, on a stage, under perfect conditions.
Buying The Demo Instead Of The System
We keep arguing about whether AI agents are ready, but that framing misses the story. The story is that a specific kind of leader, the one who needs to look like the most forward company in the room, has learned to buy the demo instead of the system.
The habit is common enough now that Gartner gave it a name. They call it agent-washing: rebranding chatbots, assistants and old automation as autonomous agents without the capability underneath. Gartner estimates that of the thousands of vendors selling agentic AI, only about 130 are the real thing. The firm also predicts that more than 40% of agentic AI projects will be canceled by the end of 2027, not because the technology failed, but because of escalating costs, unclear value and missing controls.
The incentive was never really about the technology. It is about optics. An agentic announcement signals that you are ahead. Trimming headcount around it signals that you are disciplined. Both land well on an earnings call, but neither tells you whether the system survives contact with a real workflow.
Waiting For The Bill
But reliability does not add up; it multiplies. An agent that is 90% reliable on a single step sounds excellent. String ten of those steps into a real workflow and your odds of a clean end-to-end run fall to roughly 35%. Drop each step to a still-respectable 80% and ten steps leave you near 10%.
Real work is not one step. It is dozens of them, chained together, each one inheriting the mistakes of the one before it. The demo shows you the single step. Production hands you the chain. Nobody applauds the chain.
The independent numbers back this up. Stanford's 2026 AI Index found that the best agents now complete about 66% of real computer tasks, up from just 12% a year earlier. That leap is genuinely impressive, and it deserves to be celebrated. It also means today's strongest agents still fail roughly one in three times at the kind of multi-step work companies are eagerly handing them.
One in three. You would not staff a team you expected to be wrong on every third task, and then remove the experienced people who used to catch the errors.
Then the bill arrives, and it is larger than a wasted project budget. When an autonomous system makes a decision that harms someone, denies a loan, misroutes a medical request, leaks data the company was trusted to protect, the obvious question is who is responsible. "The agent did it" is not an answer a regulator accepts. Under the EU AI Act, fines for the most serious violations reach 35 million euros or 7% of global annual turnover, whichever is higher, and the rules reach any company whose AI touches a person in Europe, no matter where that company is based.
The deeper problem is that you often cannot see inside the thing you deployed. The same Stanford report notes that the most resource-intensive AI systems now disclose less about how they were built than they used to, not more. So, when an agent acts in a way nobody intended, the team that rolled it out frequently cannot reconstruct why. You cannot defend a decision you cannot explain, and you cannot explain a decision made inside a box nobody can open.
Are You Buying An Agent Or A Demo?
None of this is an argument against agents. I am building them, and I believe in where they are going. It is an argument against theater. Before the press release, ask three unglamorous questions.
• What is the full-workflow success rate, not the flattering per-task one?
• Who owns the decision, by name, when the agent gets it wrong?
• Can we reconstruct why it acted, clearly enough to explain it to a regulator or an angry customer?
If a vendor cannot answer those, you are not buying an agent. You are buying a demo with a maintenance contract.
Final Thoughts
I tell the people who build with me that the applause is the easy part. The cost shows up later in the failed runs, the decision nobody can explain and in the regulator who does not care how good the demo looked.
So, clap if you want. Just be the person who can still explain what your agent did once the room has emptied and the bill has arrived. Almost nobody in this market can. That is the whole game.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

1 hour ago
3













English (US)