AI agents in production: what the data shows beyond the hype

Gartner states that 40% of enterprise applications will have AI agents by the end of 2026, up from less than 5% in 2025. The software market for agents goes from US$ 86 billion in 2025 to US$ 206 billion in 2026. The numbers are impressive. The same Gartner adds, with less prominence: more than 40% of agentic AI projects will be canceled by the end of 2027 due to excessive cost, unclear business value, and insufficient risk controls. The two data points coexist because we are at the peak of inflated expectations.

What an AI agent is in 2026

The technical definition has consolidated: an AI agent is a system that perceives the environment, decides actions based on objectives and memory, executes tools (APIs, browser, terminal, database), and iterates until completing a multi-step task — without human intervention at each step.

In practice, a typical agentic workflow in 2026 makes 10 to 20 calls to LLMs per user task. Each call may invoke tools that generate more context for the next call. An agent that resolves a code bug may: read the repository, run the tests, identify the error, propose a fix, apply it, re-run the tests, and verify the regression — all autonomously.

Capabilities confirmed in production

Three classes of tasks showed reproducible results in real production:

Clinical documentation automation: Microsoft DAX Copilot (Nuance) is the most documented case. It captures conversations in the consulting room and generates drafts of clinical notes. An average reduction of 7 minutes per consultation, 50% less time on documentation. More than 10 million clinical encounters captured. It works because the task is well defined, the output is auditable by the physician, and the error is not critical (the physician reviews before signing).

Autonomous bug resolution: SWE-Bench Verified measures exactly this — resolution of real issues from GitHub repositories, evaluated by whether the tests pass after the modification. GPT-5.3 Codex scored 83%, Claude Opus 4.5 reached 80.9%. Code agents are in production in companies like GitHub (Copilot Workspace) and Cursor.

Structured data analysis: Agents with access to a database via SQL, spreadsheets via API, and the ability to generate reports with interpretation. Low operating cost when the context is well delimited and the tools have deterministic outputs.

Where agents fail

Gartner's report on cancellations points to three main causes:

Error propagation: In multi-step workflows, an error in step 3 corrupts steps 4 to 10. Unlike human code, which has explicit verification layers, LLM agents do not silently detect when their previous premise was wrong. The result is extensive work that arrives at an incorrect conclusion confidently.

Long-context cost: Each LLM call in an agentic workflow carries the history of what happened before. An agent that made 15 calls and accumulated 300K tokens of context costs 300 times more per call than the first. The total cost of a complex task can be 50 to 100 times the cost expected by whoever planned the system.

Unpredictable edge behaviors: Agents follow instructions in prototypical cases, but exhibit unexpected behaviors in edge cases — unusual inputs, tool failures, ambiguous responses from external APIs. The space of edge cases is enormous, and testing exhaustively is impractical.

The deployment model that works

The empirical observation of successful implementations converges on a few patterns:

Tasks with fast and verifiable feedback have better performance. The code agent works because the tests pass or fail — the feedback is binary and immediate. Financial analysis agents without objective verification have more hallucinations by nature.

Strict scope reduces failures. The best agents in production are highly specialized: an agent that only does triage of support emails, or only report generation from structured data. "General" agents that do "anything" consistently have worse performance.

Human in the loop for irreversible decisions. The emerging pattern is not full-autonomy but human-in-the-loop: the agent executes the analysis and proposes the action, the human approves before execution. This captures most of the productivity gain and maintains control over consequences.

The agentic cycle of 2026 is still in the phase where the hype surpasses the execution. But the validated use cases — medical documentation, code resolution, data analysis — show that the real value exists. The question is not whether agents work, but in which specific contexts, with which safeguards, and at what real cost.

What an AI agent is in 2026

Capabilities confirmed in production

Where agents fail

The deployment model that works

Get the latest posts