Skip to content
Dr. Shiva Kakkar
Go back

Why Context Engineering Is a Management Problem

TL;DR: Anthropic, LangChain, and most enterprise AI vendors describe context engineering as a technical discipline: prompts, retrieval, memory, tools, context windows. That description is correct, but it solves the wrong half of the problem. The harder half is organizational: which source wins when records conflict, which version of the policy is current, which exception is legitimate, which role can use which context, and who is accountable for the recommendation the system produces. Those are not engineering questions. They are management questions disguised as engineering ones, and most AI pilots stall because no one in the org is paid to answer them.


A demo never embarrasses anyone. A pilot usually does.

The same model, the same retrieval stack, the same prompt scaffolding. The demo answers a customer-renewal question crisply; six weeks later, the pilot gives a sales manager a recommendation that contradicts what finance has already agreed in a side email. The engineering team looks at the trace, finds nothing technically wrong, and concludes the model needs a better prompt or a bigger context window. Leadership concludes the technology isn’t there yet. Both are wrong. The model is fine. The retrieval is fine. The organization is what gave the system a contradiction and asked it to render judgment.

This is the pattern I keep seeing across executive-education classrooms and inside the firms those participants run. They come in expecting to debug their prompts. They leave realizing they have an organizational design problem they never named.

The Visible Category

The phrase “context engineering” was promoted into the public vocabulary by Anthropic’s applied team in late 2025, building on a quieter shift inside the developer community a year earlier. Their argument was that prompt engineering, the discipline of writing a single instruction well, was the wrong unit of analysis once agents started doing real work. The right unit was the broader state available to the model at the moment of generation: tokens, instructions, memories, examples, retrieved documents, tools, and conversation history. Anthropic’s framing was that context is a finite resource, that more is not better, and that the engineer’s job is to curate ruthlessly.

That framing is good. It moves attention from clever wording to the system condition that actually shapes model behaviour. LangChain, LlamaIndex, and most enterprise AI platforms built on the same shift. Retrieval-augmented generation became the dominant pattern. Memory systems, tool calls, evaluator loops, and agent state got their own product categories. There are now roles called Context Engineer at well-run AI teams, and the work they do is real.

Yet inside an enterprise, the work hits a wall that no amount of better tooling overcomes. The technical question is “what context will improve this model’s output?” The enterprise question is something stranger: “what context should this organization make available for this model, this user, this decision, this workflow, and this consequence?” That second question cannot be answered by a retrieval pipeline alone. It has to be answered by someone with authority.

The Hidden Category

Consider a sales manager deciding whether to discount a strategic account. A technically competent system retrieves the CRM history, the renewal forecast, support tickets, an unresolved compliance note, a legal clause from the master agreement, an internal margin model, and an email in which a senior account owner promised flexibility. All of it is semantically relevant. None of it is automatically usable.

Six questions sit between the retrieved context and a legitimate recommendation, and each one is a management question:

  • Which source wins when records conflict?
  • Which version of the policy is valid right now?
  • Who is authorized to use which context for this decision?
  • Which local exception matters, and which is shadow practice that should never have happened?
  • Who is accountable for the recommendation once the system produces it?
  • When should the system refuse to recommend, and escalate instead?

These are not edge cases. They are the substance of organizational life. Galbraith argued half a century ago that organizations exist because tasks are too uncertain to be handled by one person, and that structure is the mechanism by which uncertainty gets processed. Tushman and Nadler sharpened the point: organization design is information-processing design. LLMs do not replace that argument; they extend it. Earlier systems moved information from one desk to another and left the inference to a human. LLM-enabled systems perform part of the inference itself. The design problem now includes the conditions under which inference happens, not only the conditions under which information moves.

Retrieval-augmented generation fetches documents. Enterprise context engineering decides which documents count, when they count, for whom they count, and what action they can support.

That distinction is the one most engineering teams cannot make on their own, because making it requires deciding who in the organization is right when two functions disagree. That is a managerial act.

Coordination Costs Are the Real Bottleneck

Sangeet Choudary, in Reshuffle, argues that the binding constraint in modern knowledge work is no longer task execution. AI has collapsed the cost of execution. The binding constraint is coordination: the hidden friction that arises when people, information, and decisions need to be aligned across departments, systems, and roles. Most large organizations, he points out, still coordinate through meetings and document revisions because the knowledge involved is tacit and the information that does exist isn’t easily standardized.

Context engineering, framed as a technical problem, treats coordination as something to be solved by retrieval. Pull the right documents, the thinking goes, and the disagreement disappears. But the disagreement isn’t an information-retrieval failure. It’s a structural feature of how the firm actually runs. Sales believes one number. Finance believes another. Both are right inside their own functional logic. The LLM, asked to synthesize, will produce a fluent answer that hides the conflict rather than surfacing it.

There is a financial-analysis example that captures this exactly: junior analysts in a bank, working in silos, were feeding inputs into senior valuations without ever seeing the full pipeline. When their roles were redesigned to involve them in the whole valuation, the firm discovered that different software packages had been producing wildly different company valuations for years. The siloed structure had been hiding the inconsistency. No retrieval system would have found this. A reorganization did.

In an LLM-enabled enterprise, that pattern compounds. Silos no longer merely slow information flow. They contaminate inference, because the system pulls fragments from each silo and recombines them into a single coherent recommendation. The output sounds integrated even when the underlying organization is fragmented. This is what makes enterprise context engineering structurally different from RAG. RAG is a retrieval mechanism. Context engineering, done seriously, is an authority-allocation mechanism. The question “which document should the system trust” is the question “which function does this firm trust on this kind of decision,” and that question has an organizational chart attached to it.

What This Means for Roles

The companies treating this as a CIO problem are building better pipelines and getting nowhere. The companies treating it as a COO problem are the ones moving from demo to deployment. Increasingly, that means hiring a senior operator with cross-functional authority and calling them Head of Context, Head of AI Operations, or simply giving the mandate to a chief of staff.

The work that role does is unglamorous. It is mapping source precedence by workflow: in a customer-pricing decision, finance authority overrides sales notes; in a customer-renewal decision, the relationship history overrides the standard contract; in a refund above a threshold, the system must escalate to a named approver. It is naming the exceptions that the formal process has stopped acknowledging but the frontline relies on. It is making decision rights explicit for the first time, often after years of informal ambiguity. And it is rebuilding workflows so that the AI-supported recommendation reaches the person who is actually accountable, not the person closest to the screen.

The pattern repeats often enough in the executive classrooms I teach in that I now expect it. A participant describes their organization’s AI pilot. They explain what the system does, what it gets wrong, what their engineering team is trying to fix. After ten minutes of listening, the failure usually has not been the model, the prompt, or the retrieval. The failure has been that no one in the organization was paid to decide which function’s data is authoritative when sales and finance disagree on the same customer. The CIO had built a competent pipeline. The COO had never been asked to allocate context authority. That allocation is the work most pilots are missing, and it cannot be done by an engineering team alone. The engineering of context is downstream of the management of it.

The companies that will get value from enterprise AI in the next eighteen months will not be the ones with the best retrieval stack. They will be the ones whose leadership stopped pretending this was an engineering decision and started treating it as the organizational design problem it has always been.


About the author

Dr. Shiva Kakkar runs Gradeless, the AI venture that built Rehearsal — a mobile-first capsule learning platform delivering 15-minute interactive courses on management, business strategy, and AI for managers. He teaches Management Development Programs in leadership, organizational behavior, and AI strategy at XLRI Jamshedpur, IIM Ranchi, IIM Rohtak, and other top-tier Indian B-schools. Gradeless’s platforms are deployed across Jaipuria Institute of Management and the Seth M.R. Jaipuria K-12 schools network. Shiva writes on educational AI, organizational behavior, and the socio-economics of credentials at shivakakkar.com. Connect on LinkedIn.

Dr. Shiva Kakkar

Dr. Shiva Kakkar

PhD IIM-Ahmedabad · Ex-faculty XLRI · Head of Product, Rehearsal AI

Dr. Shiva has trained 2,000+ managers across India's top organizations (HDFC Bank, Infosys, CBDT) on GenAI adoption. Former full-time faculty at IIM Nagpur, XLRI Jamshedpur, and Goa Institute of Management. He founded Rehearsal AI — an interview prep platform used by 3,600+ candidates.


I write about what actually works

No hype, no prompt tricks — practical learnings and insights from my own experience implementing GenAI. Join 1500+ Readers.

For program inquiries, click here or mail to: shivak@iima.ac.in