Skip to content
Dr. Shiva Kakkar
Go back

Why I Stopped Writing Prompts and Started Writing Skills

TL;DR: Most people interact with AI by writing a long instruction in a chat box and hoping. After spending six months building serious AI workflows (content pipelines, audit systems, a writing voice that sounds like me rather than like ChatGPT), I have stopped treating the prompt as the unit of work. The unit is now a skill: a small, file-shaped, version-controlled artifact the model reads at the exact right moment, layered into agents and pipelines that audit their own output before it leaves the workflow. This post is a tour of that stack at the level a non-engineer can use. The actual playbook stays in the workshop, but the architecture is worth seeing.


I opened a file the other day called prompts-2025-q4.md. It was a long instruction I had written nine months ago to get an AI to draft a session plan for a leadership program. The prompt was almost two thousand words. It was clever in places. It had a persona, a tone, a list of constraints, three examples, and a chain-of-thought scaffolding I had been pleased with at the time.

I tried it again, against a current model, on a similar task. The output was acceptable. It was also indistinguishable from what any minimally competent ChatGPT user would produce with a fraction of the effort.

That was the moment I understood that prompt engineering, as a practice, had been quietly obsolesced by what came next.

What Most People Are Still Doing

If you have used AI for serious work, you probably have a folder somewhere with prompts you have refined over months. The good ones are long. They have role descriptions, output formats, examples, edge cases, and “do not” lists. You copy them into a chat box, paste in your input, and hope the model holds the context together long enough to produce something usable.

This is the default mental model: prompt as instruction, AI as oracle. It is how almost every public AI literacy course is taught. It is what executives mean when they say their employees need “prompt engineering training.” It is also the version of AI use that gets disrupted the moment the model gets better, or the moment a colleague figures out a slightly cleaner phrasing, or the moment the actual problem changes shape.

A prompt is a one-shot instruction. It works for one task, in one session, with one model, until the conversation context fills up and the model forgets what you told it. The fragility is structural. It is not a thing better wording can fix.

The Layering That Replaced It

What replaced prompts in my workflow is a layered stack. Each layer does one thing the layer below could not. None of the layers is technically complex. The trick is knowing what belongs at which layer.

%%{init: {'theme':'base', 'themeVariables': {
  'primaryColor': '#f5efe7',
  'primaryTextColor': '#2d2a26',
  'primaryBorderColor': '#d9cfc2',
  'lineColor': '#8b2635',
  'fontFamily': 'Georgia, Crimson Pro, serif',
  'fontSize': '15px'
}}}%%
flowchart TD
    A["<b>Prompt</b><br/><i>one-shot instruction<br/>in a chat box</i>"]
    B["<b>Skill</b><br/><i>a small file the model<br/>reads at the right moment</i>"]
    C["<b>Agent</b><br/><i>skill + tools + memory + identity</i>"]
    D["<b>Pipeline</b><br/><i>agents passing the baton<br/>through audit gates</i>"]

    A -->|"works for a single task"| B
    B -->|"works for a recurring task done your way"| C
    C -->|"works for an end-to-end job that needs judgment"| D

    style A fill:#ebe3d8,stroke:#d9cfc2,stroke-width:2px,color:#6b6258
    style B fill:#f5efe7,stroke:#8b2635,stroke-width:2px,color:#2d2a26
    style C fill:#f5ecd4,stroke:#b8860b,stroke-width:2px,color:#2d2a26
    style D fill:#8b2635,stroke:#6b1d29,stroke-width:3px,color:#f5efe7

A skill is the first move up the ladder. A skill is not a prompt. It is a file, usually markdown, that lives in a known location, describes a small, recurring kind of work, and gets loaded by the model only when the user asks for that kind of work. The trigger is intent: when I ask Claude to draft a particular kind of writing, the skill for that kind of writing wakes up. It carries its own voice rules, its own self-checks, its own references to other files. The chat itself stays clean.

The advantage is not just tidiness. A skill is version-controlled in a way a prompt in a chat box never is. I can change a single line in the skill file today, and every future invocation across every project inherits the change. A prompt is a frozen instruction. A skill is a living convention.

An agent is a skill plus three things: tools (the model can actually do something, not just describe), memory (it carries state across sessions), and identity (it knows who it is and who you are). The shift from skill to agent is what lets the AI go from “answers your question” to “takes the next action because that is what its role demands.”

A pipeline is what happens when agents stop working alone. The output of one agent becomes the input of another, with explicit hand-off rules and audit gates between them. This is where AI work starts to look industrial rather than artisanal.

The reason most enterprise AI pilots disappoint is that they are running prompts in a world that has moved on to pipelines. They are bringing chat-box thinking to a problem that needs assembly-line thinking.

What This Lets You Do

Three things become possible once you move off the prompt layer. None of them is glamorous. Each one is the difference between AI that performs in a demo and AI that produces work you would put your name on.

One: the model writes in your voice rather than in its own. Models default to a flat, capable, slightly committee-flavored register. Telling a model to “write like Hemingway” or “be conversational” does almost nothing useful. What works is to give the skill a voice corpus — a document that codifies the patterns you actually write in. Not “use short sentences” but a set of sentence-level tests the output has to survive, a list of phrases you would never say, a register guide, and a small set of validated past samples. The skill loads this document every time it writes, and the model holds itself to it. The improvement is dramatic and it does not regress, because the corpus is a file, not a prompt instruction the model will forget.

Two: the model audits itself before it returns work. A pipeline that produces writing also runs a separate audit pass against criteria you would otherwise check by hand. Did the output use any phrases on the kill list? Does the structure follow the seven-beat template? Are the institutional references placed close enough to the brand mentions? If any check fails, the pipeline does not return the work to me. It revises and re-checks. By the time I see the output, the obvious failures are already gone. The self-audit is what makes the whole thing trustworthy.

Three: the model sources before it generates. The single biggest cause of AI output that “sounds like AI” is that the model is reaching into its training data when it should be reaching into your work. A serious skill has a sourcing hierarchy built into it. The model is told: before you write anything substantive, look in these places, in this order. Your own captured thinking comes first. Your reading library comes next. The public web comes last, and only when it adds something the earlier tiers cannot. The output stops being commentary and starts being authorship, because the raw material is yours.

%%{init: {'theme':'base', 'themeVariables': {
  'primaryColor': '#f5efe7',
  'primaryTextColor': '#2d2a26',
  'primaryBorderColor': '#d9cfc2',
  'lineColor': '#6b6258',
  'fontFamily': 'Georgia, Crimson Pro, serif',
  'fontSize': '14px',
  'clusterBkg': '#faf6f0',
  'clusterBorder': '#d9cfc2'
}}}%%
flowchart LR
    subgraph SOURCES["<b>Sources</b> &nbsp;<i>(tiered)</i>"]
        direction TB
        S1["Your captured thinking"]
        S2["Your reading library"]
        S3["Field transcripts"]
        S4["Public web<br/><i>last resort</i>"]
    end

    subgraph SPINE["<b>Voice spine</b> &nbsp;<i>(a file, not a prompt)</i>"]
        direction TB
        V1["Sentence-level tests"]
        V2["Phrases you would never say"]
        V3["Register and pacing rules"]
    end

    SOURCES --> DRAFT["<b>Draft</b>"]
    SPINE --> DRAFT
    DRAFT --> AUDIT{"<b>Self-audit</b><br/>passes all checks?"}
    AUDIT -- "no" --> DRAFT
    AUDIT -- "yes" --> OUT["<b>Output</b>"]

    style S1 fill:#f5efe7,stroke:#d9cfc2,color:#2d2a26
    style S2 fill:#f5efe7,stroke:#d9cfc2,color:#2d2a26
    style S3 fill:#f5efe7,stroke:#d9cfc2,color:#2d2a26
    style S4 fill:#f5efe7,stroke:#d9cfc2,color:#2d2a26
    style V1 fill:#f5efe7,stroke:#d9cfc2,color:#2d2a26
    style V2 fill:#f5efe7,stroke:#d9cfc2,color:#2d2a26
    style V3 fill:#f5efe7,stroke:#d9cfc2,color:#2d2a26
    style DRAFT fill:#ebe3d8,stroke:#8b2635,stroke-width:2px,color:#2d2a26
    style AUDIT fill:#f5ecd4,stroke:#b8860b,stroke-width:2px,color:#2d2a26
    style OUT fill:#8b2635,stroke:#6b1d29,stroke-width:3px,color:#f5efe7

The architecture is not complicated. Anyone who can write structured markdown can build a version of it for their own work in a weekend. What is hard is the taste required to know what belongs in the voice spine, what belongs in the source hierarchy, and what the audit should actually check for. That part is not engineering. That is craft.

Why I Built This

My first serious attempt at a content-generation pipeline was a three-thousand-word prompt that tried to do everything in one pass: research the topic, draft the body, generate supporting examples, audit the voice. It worked the first time. It failed differently every time after that. I would read an output and not be able to tell whether the failure was the prompt, the model, the input, or my own attention. I rewrote the prompt eleven times before it occurred to me that the problem was not the prompt.

The fix was to split that one instruction into seven smaller skills, each owning one phase, each running its own self-audit before handing the work to the next. The system became more reliable the moment it became less monolithic. That architectural shift, from one big instruction to a pipeline of small, focused, self-auditing skills, is now the spine of every serious AI workflow I rely on. A monolithic prompt could never hold the variation. A pipeline of skills can. I did not set out to build this layer. I built it because nothing below it produced work I was willing to ship.

The published Anthropic documentation now treats Skills as a first-class primitive in their developer tooling, and that is the right move. The shift from prompt-as-primitive to skill-as-primitive is the first real change in how non-engineers will work with AI since the chat box appeared, and most of the working world has not noticed yet.

If you are someone whose job involves producing serious written work (strategy memos, board updates, teaching material, research briefs, client documents), the migration from prompts to skills is the next thing for you to learn. You do not need to write code. You need to write your conventions down once, store them somewhere your AI assistant can read, and stop typing the same instructions into a chat box every morning.

The chat box was a transitional interface. It is the typewriter of this era. The work is going to move up the stack whether or not you do, and the people who built their own skill layer first will look, two years from now, like they got there by accident.


About the author

Dr. Shiva Kakkar runs Gradeless, the AI venture that built Rehearsal — a mobile-first capsule learning platform delivering 15-minute interactive courses on management, business strategy, and AI for managers. He teaches Management Development Programs in leadership, organizational behavior, and AI strategy at XLRI Jamshedpur, IIM Ranchi, IIM Rohtak, and other top-tier Indian B-schools. Gradeless’s platforms are deployed across Jaipuria Institute of Management and the Seth M.R. Jaipuria K-12 schools network. Shiva writes on educational AI, organizational behavior, and the socio-economics of credentials at shivakakkar.com. Connect on LinkedIn.

Dr. Shiva Kakkar

Dr. Shiva Kakkar

PhD IIM-Ahmedabad · Ex-faculty XLRI · Head of Product, Rehearsal AI

Dr. Shiva has trained 2,000+ managers across India's top organizations (HDFC Bank, Infosys, CBDT) on GenAI adoption. Former full-time faculty at IIM Nagpur, XLRI Jamshedpur, and Goa Institute of Management. He founded Rehearsal AI — an interview prep platform used by 3,600+ candidates.


I write about what actually works

No hype, no prompt tricks — practical learnings and insights from my own experience implementing GenAI. Join 1500+ Readers.

For program inquiries, click here or mail to: shivak@iima.ac.in