Why Your AI Productivity Gap Won't Close

OpenAI's 2025 State of Enterprise AI found a 6x productivity gap between its highest-intensity AI users and the median employee using identical tools within the same organizations. Most enterprises see that gap and fund more tool training. It does not close. MIT Sloan researchers ran a controlled experiment with 1,900 participants and found that when users upgraded to a more advanced model, half the performance gain came from changes to their prompts, not from the model itself. The performance difference between your best and average AI users is not in the tool. It is in what each employee brings to the interaction.

The conventional response is more training. The ROI data on formal AI training programs is real: structured programs deliver measurable proficiency gains, and trained employees outperform self-taught workers by a significant margin on standard AI tasks. But proficiency with a tool is not the same as the capability that determines what you ask of it. The binding constraint on AI output quality is neither the model nor the interface. It is inquiry quality, a function of domain expertise and critical thinking, that proficiency training does not build.

The Inquiry Gap

Call it The Inquiry Gap: the distance between an employee's ability to operate an AI tool and their ability to ask it something worth answering.

The gap is invisible in adoption metrics. An employee who uses AI every day to draft documents, summarize reports, and pull data from research databases can look like a power user by any activity-based measure. But if that employee cannot evaluate outputs against domain knowledge, construct a meaningful follow-up question, or recognize when the AI's answer is competent but wrong, the activity metric is measuring the wrong thing entirely. The outputs are only as good as the question. The question is only as good as the person asking it.

HBS researchers studying 758 BCG consultants documented this mechanism directly. For tasks inside AI's capability range, consultants using the tool significantly outperformed those working without it. For a complex task outside that range — one where AI generated plausible but incorrect analysis — consultants using AI were 19% less likely to produce correct solutions than their unassisted peers. They accepted the tool's confident wrong answer because they lacked the domain knowledge to form an independent expectation of what the right answer should look like. The tool's confidence became the error. Domain expertise is what turns that confidence into a check.

The Inquiry Gap is not visible in a productivity dashboard. It surfaces in review meetings when a team can defend an AI-assisted recommendation but cannot explain why the framing was correct, or when an analysis produces a confident answer to the wrong question. It shows up when employees accept outputs that a domain expert would immediately flag, because they lack the independent frame to check the answer against. The gap does not appear in adoption metrics. It appears in the quality of the decisions those metrics produce.

The Atrophy Problem

Here is where the analysis turns uncomfortable.

A Microsoft Research study of 319 knowledge workers, documenting 936 first-hand task examples, found that higher confidence in AI correlates directly with reduced effort in critical thinking. Knowledge workers report shifting cognitive effort away from analyzing problems and toward verifying AI outputs. That shift is not neutral. Critical thinking capacity develops through use. The more employees rely on AI to perform the analytical work, the less they practice the judgment that produces good questions in the first place.

The specific finding matters here. It is not that employees become intellectually passive. It is that confident AI users consciously invest less in their own problem analysis because they trust the tool to handle it. That trust is often warranted for routine tasks. The problem is that it generalizes. The employee who has stopped forming their own analysis before prompting is the same employee who will not detect when the AI's answer is wrong, because they have no independent frame to check it against.

Gartner forecasts that 50% of organizations will require AI-free skills assessments by 2026, specifically because AI use is eroding critical thinking at a measurable rate. That is a planning assumption, not a theoretical concern. Organizations running enterprise AI programs are already observing the atrophy: employees who can prompt their way to a recommendation but cannot reconstruct the reasoning behind it, or evaluate whether the reasoning was sound.

This is the atrophy spiral. The tool is most valuable to employees who can think well enough to use it well. Those employees use it more frequently and trust it more. That reliance reduces their independent practice of the analytical judgment that determines output quality. The employees who benefit most from AI are the ones it is most systematically disarming.

Fortune, reporting in December 2025 on executive surveys, found that leaders identify strategic and critical-thinking shortcomings as the top AI-readiness problem in their organizations. The executives correctly identify the problem and are funding tool proficiency programs as the solution. That gap between diagnosis and investment is where the organizational risk accumulates.

Where This Argument Gets Complicated

The counterargument is worth taking seriously: prompt engineering is a teachable skill, and better training design can go well beyond interface navigation. Programs that teach structured problem decomposition, output validation, and iterative refinement are not the same as programs that teach employees which buttons to push. Formal AI training programs deliver real productivity gains, and the case for investing in them is not wrong.

The strongest version of this objection points to the AI-native generalist: the employee who performs well across domains by combining adequate subject knowledge with superior habits of iteration, validation, and problem decomposition. These employees exist, and in some AI-assisted settings, they outperform narrow specialists. But the HBS Jagged Frontier research is specific about the mechanism: when employees lack domain expertise to independently evaluate AI output, their performance collapses on tasks outside AI's capability range. Knowing which constraints matter in a clinical workflow is different from knowing which constraints matter in a financial model. The training that builds iteration habits does not supply the domain judgment that determines whether the framing was sound.

The HBS evidence is explicit: without underlying domain knowledge, consultants using AI performed substantially worse than those working without it, because they accepted confident wrong answers they had no basis to question. A well-designed training program can improve how employees structure questions. It cannot provide the subject-matter knowledge that determines whether the question was the right one. The specific mechanism: an employee cannot recognize that an AI answer is wrong without holding an independent expectation of what the right answer should look like. That expectation is what domain expertise provides. Training can build the habit of checking. It cannot build the knowledge that makes the check meaningful.

The Microsoft Research finding adds a constraint that even the best-designed training cannot fully resolve. The atrophy mechanism is a function of reliance, not of training quality. The more employees delegate analytical work to AI, regardless of how well the training program was built, the less they practice independent reasoning. Better training can slow the rate of atrophy. It does not eliminate the mechanism that produces it.

Implications for Leaders

The Inquiry Gap is not primarily a training problem. It is a measurement problem that has created the conditions for a training problem.

What does your AI success metric actually measure? Most enterprise AI programs track adoption rate, seats deployed, tasks completed, and time saved. None of those metrics distinguishes between a proficient user generating plausible outputs and a domain expert generating correct ones. If your program's performance data cannot tell you which employees are producing better work rather than more work, you are measuring activity rather than output quality. The diagnostic question for this specific gap: Does your organization have any mechanism to assess whether AI-assisted decisions in high-stakes domains are as reliable as decisions produced by domain experts working without AI?

Are your AI training programs building inquiry capability or tool proficiency? The specific test: Does the training require employees to construct the question from their own domain knowledge before prompting, or does it teach them to prompt their way around the need for domain knowledge? Programs built around prompt templates and interface walkthroughs are building tool proficiency. Programs that require employees to define the problem independently first, evaluate AI outputs against domain knowledge, and articulate precisely why an answer is wrong are building inquiry capability. The two approaches look similar from the outside. Their effects on the underlying capability diverge over 18 months.

What is your organization actively doing to protect and develop the domain expertise and critical thinking that makes inquiry quality possible? If employees are using AI to perform the knowledge work they used to do manually, they are not building the expertise that determines the quality of AI output. Organizations that have automated entry-level analytical work, reduced structured independent problem-solving, or removed apprenticeship-style knowledge development have removed the training ground for the capability their AI investment most requires. The strategy question is not whether to invest in AI. It is whether you are simultaneously investing in the human judgment that determines what AI produces.

The Bottom Line

The 6x productivity gap between the highest-intensity AI users and the median employee with access to the same tools is not a proficiency story. It is an inquiry story. The employees generating the best AI outputs know more, think more carefully, and ask harder questions. They are not more familiar with the interface. They bring more substance to the interaction. Most organizations are funding programs to close the proficiency gap while the critical thinking and domain expertise that determine output quality degrade in the background, measurably and on a timeline that Gartner, Microsoft Research, and a growing body of executive survey data all confirm. The AI skills gap executives are naming correctly is a thinking gap. The training programs trying to close it are often the mechanism by which it widens.

Sources

OpenAI — "The State of Enterprise AI," 2025. https://openai.com/index/the-state-of-enterprise-ai-2025-report/

MIT Sloan Management Review — "Study: Generative AI Results Depend on User Prompts as Much as Models," 2025. https://mitsloan.mit.edu/ideas-made-to-matter/study-generative-ai-results-depend-user-prompts-much-models

Harvard Business School — "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality," Dell'Acqua, McFowland, Mollick et al., 2024. https://www.hbs.edu/ris/Publication Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf

Microsoft Research — "The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects from a Survey of Knowledge Workers," 2025. https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/

Gartner — "Gartner Unveils Top Predictions for IT Organizations and Users in 2026 and Beyond," October 2025. https://www.gartner.com/en/newsroom/press-releases/2025-10-21-gartner-unveils-top-predictions-for-it-organizations-and-users-in-2026-and-beyond

Fortune — "The AI Skills Gap Is Really a Critical Thinking Gap," December 12, 2025. https://fortune.com/2025/12/12/ai-skills-gap-talent-executives-fear-risk-critical-strategic-thinking/