How to Write a SPEC That Actually Works (SDD, AI & Agents)
A practical guide to writing executable specs. Reduce ambiguity, decide early, and work effectively with AI and coding agents without creating legacy.

I don’t write specs because I enjoy documenting things. I write them because I got tired of misunderstandings, circular conversations, and having to reinterpret decisions once there’s already code, tests, and prompts in motion. This is the simplest way I’ve found to write specs you can actually execute—by humans and by AI agents—without bureaucracy and without noise.
This article is the natural follow‑up to the Complete Guide to Spec‑Driven Development (SDD) with AI. There I explained, at a high level, what SDD is and how it works. Here I’m going much more practical: how to write a spec that doesn’t fall apart the moment someone tries to use it.
Why Most Specs Don’t Work
I’d love to tell you I’ve been fighting bad specs for years, but that would be lying to myself (and to you). The reality is that until fairly recently, agents weren’t consistent enough for it to be worth writing “real” specs.
In 2026, though, that’s not the problem anymore. Specs fail for something more basic: they leave too much room for interpretation. They feel clear to the person who has everything in their head, but they become ambiguous as soon as another person—or an agent—tries to execute them.
Deep down, it’s the same thing programmers have dealt with forever when we’re handed vague, loosely defined documentation.
In my experience, when a spec goes wrong, you’ll usually see one or more of these symptoms:
- It mixes requirements with technical decisions
- It uses comfortable language, but it’s imprecise
- It assumes things are “obvious”
- It doesn’t clearly define where scope ends
- It leaves calculations/limits undefined (units, rounding, thresholds)
AI doesn’t fix this. It amplifies it. The agent won’t ask what you meant—it will decide on its own… and if you’re not careful, it will generate legacy from minute zero.
What I’m Trying to Achieve When I Write a SPEC
Over time, I stopped thinking of the spec as a document and started seeing it as a tool to make decisions earlier.
For me, a spec works when it passes a very simple test: at the end of development, I can say—without arguing—whether it was met or not.
When I can’t do that, the debate is never about the code. It always ends up being about something we assumed and never wrote down.
When that happens, there’s almost always a reason. Usually one of these three things wasn’t properly nailed down:
- The problem wasn’t scoped clearly
- The expected behavior wasn’t verifiable
- The exclusions weren’t explicit
The Structure I Use (and Why)
There’s no magic template. Every case needs its own care and context.
What I have learned is that some structures force you to think better—even when the idea is still a bit raw.
The one I use most often separates three things you really shouldn’t mix:
- what problem we’re solving and in what context,
- how the system should behave from the outside,
- and how we’ll know—without arguing—that it’s done right.
It’s no accident these layers map pretty well to Requirements, Design, and acceptance criteria. Not because you should follow them as dogma, but because they push you to make decisions in the right order.
Requirements: Turn the Idea into a Concrete Problem
This is the least glamorous part—and by far the most important.
Here I’m not trying to write beautifully. I’m trying to remove ambiguity.
When I start a spec, I force myself to make these things explicit:
- What exact problem we’re solving
- The real context where it will be used
- The goal we want to achieve
- And, above all, what’s out of scope
Over time I’ve learned the Not included section is the most valuable block. Not because it restricts you, but because it prevents others—humans or AI—from making decisions they were never supposed to make.
More than once, just filling out this section made me realize the original idea wasn’t as clear as it seemed.
Design: Describe Behavior Without Deciding Implementation
This is where many specs drift off course without anyone noticing.
The goal of this section isn’t to design the system internally. It’s to describe what must happen from the outside: what happens in the happy path, what happens when something fails, and which edge cases deserve attention.
If someone starts arguing about frameworks, patterns, or libraries while reading this section, it’s usually a sign the spec is mixing levels.
When I write it well, something interesting happens: different people can reach different technical solutions… and all of them can be valid.
Acceptance Criteria: Where the Debate Ends
If I had to keep only one part of the spec, it would be this.
Acceptance criteria are where language stops being interpretive and becomes testable. They don’t describe how to implement anything. They describe when something is correct and when it isn’t.
I always write them with one specific question in mind: could someone who wasn’t in the conversation decide whether this meets the spec?
That’s why I avoid “nice” adjectives and lean on simple structures like Given / When / Then. Not because it’s BDD, but because it forces precision.
When the criteria are well written, a lot of arguments simply disappear.
A Complete Example (Deliberately Scoped)
Important note on anonymization: the original spec used in the project defines—with contractual precision—the synchronization identity for responses, the minimum file model, and the certainty rule by format. In this public version, those details are deliberately omitted for anonymization, but in a real executable spec they are mandatory and non‑negotiable.
So this doesn’t stay theoretical, I’m going to use a real spec I applied in a work project—rewritten and anonymized so I don’t expose internal details or sensitive decisions.
The original version was a solid technical specification: it explained the flow, architecture, and involved components well. The problem is that, as a SPEC in the SDD sense, it left too many decisions implicit.
This is the same idea, but written as a spec that governs decisions and can be executed without interpreting or inventing rules.
⚠️ From here on, the SPEC begins
Spec — Importing and Processing Assessment Results (Designed to Work with Junie)
Problem to solve
We need to import assessment results from external systems and persist them consistently, ensuring that:
- results are correctly associated with the corresponding assessment,
- reimporting data does not create duplicates,
- and the structure and semantics of participants’ responses are preserved.
This process must allow safe, repeatable reimports without introducing inconsistencies into existing data.
Usage context
The system receives files generated by external platforms containing the results of an assessment. These files may vary in format, but they follow a coherent tabular structure.
The system is responsible for interpreting this data, transforming it into domain entities, and storing it as the internal source of truth.
Goal
Given a valid results file and the minimum assessment metadata:
- the system imports the data,
- creates or updates the corresponding assessment,
- and persists participants’ responses consistently.
The process must be idempotent and deterministic: importing the same file more than once must not create duplicates or inconsistencies.
Minimum contract (abstract):
- Inputs: results file +
evaluation_external_id+ minimum metadata (defined in an internal appendix). - Outputs: persisted assessment with associated responses.
- Invariants: no duplicates, per‑participant order preserved, same final state after N reimports.
Scope
Included
- Receiving a results file together with basic assessment metadata.
- Detecting and using the appropriate parser based on the file format.
- Transforming data into domain entities.
- Persisting the assessment and its associated responses.
- Synchronizing responses on reimport.
Not included
- Semantic validation of the academic content of responses.
- Manual correction or later editing of results.
- Report generation or visualizations.
- Authentication or authorization management.
- Advanced handling of format errors beyond rejecting the file.
Relevant business rules
- An assessment is uniquely identified by its external identifier.
- Responses are associated with individual participants.
- Response identity: defined by a stable synchronization key (OMITTED DUE TO ANONYMIZATION).
- Some file rows represent certainty/confidence values associated with a response.
- Certainty rule: determined by a position‑based rule that depends on the format (OMITTED DUE TO ANONYMIZATION; see internal appendix on formats).
- The order of responses matters and must be preserved.
- Missing certainty values do not invalidate a response.
Expected behavior
Main flow
- The system receives a results file and assessment metadata.
- It selects a parser compatible with the file format.
- The file is processed row by row, transforming each row into a participant response.
- Responses are grouped by participant.
- An assessment entity is built containing all processed responses.
- The assessment is persisted:
- if it doesn’t exist, it is created,
- if it already exists, its responses are synchronized.
Reimport
- If the assessment already exists:
- its metadata is updated,
- new responses are added,
- existing responses are updated,
- and responses that no longer appear in the imported file are removed.
Atomicity
- If any row fails during parsing or transformation, the entire file is rejected and no data is persisted.
Acceptance criteria
- Given a valid file for a non‑existent assessment, when the import is processed, then a new assessment is created with all associated responses.
- Given a valid file for an existing assessment, when the import is processed, then reimporting the same file does not increase the total number of responses; per participant, the set of responses defined by the synchronization key is unique.
- Given a file in an unsupported format, when an import is attempted, then the process fails without persisting any data.
- Given a row that matches the format’s positional rule for certainty, when it is processed, then the response is marked as a certainty value.
- Given the same file imported N times, when it is processed repeatedly, then the final state is identical: same values, same per‑participant order, and the same total number of responses.
Constraints
- The process must be deterministic: the same input always produces the same output.
- The import must complete as an atomic operation.
- The system must be tolerant of retries.
- Handling of personal data must comply with applicable regulations.
Open notes
- The exact definition of supported formats is kept outside this spec.
- Detailed error handling will be documented in a separate spec if needed.
⚠️ End of the SPEC
Take a breath—we’re done with the example.
How I Use This SPEC with Junie (Without Turning It into “Prompt Engineering”)
Once the spec is closed, I don’t start improvising prompts. With Junie this is even more obvious: the agent lives inside the IDE, sees the code, and naturally tends to fill in gaps if you don’t close them properly.
The spec is what prevents that. I’m not using it to inspire Junie—I’m using it to limit its decision space.
Instead of writing a creative prompt, I launch an explicit task, assuming Junie already has context about the project.
This is roughly the text I tend to use:
Task: implement import of assessment results
Source of truth:
- The SPEC "Importing and Processing Assessment Results" is the only valid functional reference.
- Do not infer rules that are not explicitly stated in the spec.
Context:
- You are working inside an existing project opened in the IDE.
- The goal is not to redesign the system, but to implement exactly the described behavior.
What I expect from you:
1. Implement the flow described in the spec without adding extra behavior.
2. Identify any ambiguity that prevents implementing an acceptance criterion.
3. Propose tests that validate each acceptance criterion in a verifiable way.
Constraints:
- Do not change business rules without explicit confirmation.
- Do not introduce new formats, shortcuts, or "improvements".
- Keep the behavior deterministic and repeatable.
Working approach:
- First summarize the spec into operational steps.
- Flag potential conflict points before touching code.
- Implement incrementally, validating each criterion.The key difference here is that Junie already sees the code and the environment, so any ambiguity in the spec translates directly into implicit decisions inside the IDE.
⚠️ Watch out for the project’s global context
This spec defines what must happen and when it’s considered correct. It does not define how the project should be structured or the general development rules.
For that, in real projects, it’s important to maintain separate, stable artifacts (Agents.md, guidelines.md, test conventions, style rules, etc.) that make clear:
- how we expect the agent to work,
- which standards it must follow,
- and what it must not decide on its own.
That way, the spec stays focused on the problem, and Junie doesn’t have to infer global decisions from scattered code.
If Junie gets something wrong, it’s almost never because “the AI is bad.” It’s because the spec didn’t close a decision we assumed was obvious.
The Mistakes I See Over and Over
Some failures repeat so often they’ve almost been normalized.
Specs that read like documentation, specs that try to design the entire system, specs that never say what’s out of scope, or specs whose criteria are so vague nobody dares use them as a reference.
Each of these mistakes deserves a detailed explanation—which is why I cover them in depth in a dedicated article about common errors in Spec‑Driven Development.
Final Checklist Before You Call a Spec “Done”
Copy and paste this list. If any answer isn’t a clear yes, the spec isn’t ready:
- Is the minimum input defined?
- Are the output and its invariants defined?
- Are identities / keys explicit?
- Are errors and atomicity defined?
- Are idempotence and determinism verifiable?
- Are edge cases listed?
- Is the Not included section explicit?
- Do the criteria have a measurable oracle?
- Does anything rely on “depending on the format” without referencing an appendix/internal spec?
- Can you decide compliance without a conversation?
Tweaking a spec here is still infinitely cheaper than doing it after there’s already code, tests, or agents in production.
Why This Spec Works Better Than a Purely Technical Spec
After reading it end‑to‑end, the key difference isn’t length or formatting—it’s which decisions it closes and which it doesn’t.
This spec works better than a purely technical spec because it:
- Starts with the problem and the goal, not the architecture.
- Explicitly defines what’s in and what’s out, reducing assumptions.
- Isolates important business rules, protecting them from accidental change.
- Describes expected behavior without imposing an implementation.
- Includes acceptance criteria that let you say—without arguing—whether something meets the spec.
- Can be executed by a person or an agent without inventing new rules.
A technical spec describes the system. This spec governs how decisions get made before writing code.
