From a SPEC to tickets, tests, and acceptance criteria: a real SDD workflow
How to turn a spec into executable work: tickets, acceptance criteria, tests, and prompts for AI agents.

Writing a good spec is half the work. The other half is turning it into something that someone (person or agent) can execute. And that’s exactly where most teams fail.
In the previous articles on SDD and how to write a spec I talked about why specs matter and how to write them well. But I was still missing the most practical step: how to go from an approved spec to concrete tickets, verifiable acceptance criteria, tests derived from those criteria, and prompts that an AI agent can use to implement each ticket.
This article is that bridge. It’s what I use when I have a signed-off spec and need the team (human and agents) to start working.
The problem: specs that don’t get executed
I’ve seen technically perfect specs that sit in a Google Doc and never become real work. The team reads the spec, nods during the planning meeting, and then everyone interprets on their own what needs to be done. Vague tickets like “Implement notification system” get created and the developer (or the agent) has to guess the scope.
The result: tickets that get reopened because “this was missing,” PRs that don’t pass review because “that’s not what was asked,” and a spec that nobody consults again after the first day.
A spec that isn’t broken down into executable work is a statement of intentions, not a working tool.
Concrete example: spec for a notification system
I’ll use a real (simplified) example to show the full workflow. Imagine the spec describes an email notification system for an order platform.
Summarized spec
# SPEC: Email notification system
## Objective
Allow users to receive email notifications when their order status changes
(confirmed, shipped, delivered, cancelled).
## Scope
- Email only (no push or SMS in this phase).
- Order status changes only.
- Users can disable notifications from their profile.
## Functional requirements
1. When an order changes status, an email is sent to the user.
2. Email content depends on the new status (different template per status).
3. Emails are sent asynchronously (they don't block the order flow).
4. If sending fails, retry up to 3 times with exponential backoff.
5. Users can enable/disable email notifications.
6. A log is kept for each notification sent (status, timestamp, attempts).
## Non-functional requirements
- Maximum latency between status change and email delivery: 5 minutes.
- The system must support 1000 notifications/hour without degradation.
- Emails must comply with anti-spam policies (SPF, DKIM).
## Out of scope
- Push or SMS notifications.
- Frequency customization (daily digest, etc.).
- Notifications for events other than order status changes.This spec is clear, has a defined scope, and specifies what’s out of bounds. Now comes the question: how do I turn it into work.
Step 1: Identify the functional blocks
Before creating tickets, I identify the main functional blocks. These aren’t tickets yet, they’re logical groupings of work:
- Event infrastructure: publishing and consuming status change events.
- Notification service: deciding what to notify, to whom, and managing preferences.
- Email sending: integration with email provider, templates, retries.
- User preferences: API and persistence to enable/disable.
- Logging and observability: record of notifications sent.
Each block can be developed relatively independently, and that’s key for ticket decomposition.
Step 2: Break down into tickets
Each ticket must meet three criteria:
- Self-contained: it can be implemented and tested without depending on other tickets in progress (though it can depend on already completed tickets).
- Verifiable: it has concrete acceptance criteria that can be checked.
- Right-sized: a developer (or an agent) can complete it in 1-3 days.
For the notification spec, the breakdown looks like this:
NOTIF-1: Publish order status change events
Description: When an order changes status, the order service publishes an OrderStatusChanged event on the internal event bus.
Acceptance criteria:
- An
OrderStatusChangedevent is published every timeOrder.statuschanges. - The event contains:
orderId,userId,previousStatus,newStatus,timestamp. - The event is published asynchronously (doesn’t block the order transaction).
- If publishing fails, the error is logged but the status change is not rolled back.
Dependencies: none.
NOTIF-2: Event consumer and notification decision service
Description: A consumer listens for OrderStatusChanged events, checks the user’s preferences, and decides whether a notification should be sent.
Acceptance criteria:
- The consumer processes
OrderStatusChangedevents. - Before notifying, it verifies that the user has email notifications enabled.
- If the user has notifications disabled, the event is discarded and logged.
- If the user has notifications enabled, a
Notificationrecord is created with statusPENDING.
Dependencies: NOTIF-1.
NOTIF-3: Email sending service with templates
Description: Service that takes a pending notification, selects the template corresponding to the event type, renders the email, and sends it through the email provider.
Acceptance criteria:
- A template exists for each status:
CONFIRMED,SHIPPED,DELIVERED,CANCELLED. - Each template includes: user name, order number, new status, link to details.
- The email is sent to the user’s registered address.
- If sending succeeds, the notification moves to
SENTstatus. - If sending fails, the notification moves to
FAILEDstatus and the error is recorded.
Dependencies: NOTIF-2.
NOTIF-4: Retry system for failed emails
Description: Notifications in FAILED status are automatically retried with exponential backoff, up to a maximum of 3 attempts.
Acceptance criteria:
FAILEDnotifications are retried automatically.- Backoff: 1 minute, 5 minutes, 15 minutes.
- Maximum 3 retries. After that, moves to
PERMANENTLY_FAILEDstatus. - Each retry is recorded with timestamp and result.
Dependencies: NOTIF-3.
NOTIF-5: User notification preferences API
Description: Endpoints for users to view and modify their email notification preferences.
Acceptance criteria:
GET /api/v1/users/{userId}/notification-preferences: returns current preferences.PUT /api/v1/users/{userId}/notification-preferences: updates preferences.- By default, email notifications are enabled for new users.
- Only the user themselves can modify their preferences (auth validation).
- Changes take effect immediately.
Dependencies: none (can be developed in parallel with NOTIF-1).
NOTIF-6: Notification logging and metrics
Description: Record of each notification sent and metrics for monitoring.
Acceptance criteria:
- Each notification has a record with:
notificationId,userId,orderId,type,status,attempts,timestamps. - Exposed metrics: notifications sent/minute, error rate, average latency between event and delivery.
- Logs include enough information for debugging without exposing the user’s personal data.
Dependencies: NOTIF-3.
Step 3: Tests derived from acceptance criteria
Here’s the part where most teams cut corners. They say “yeah, we’ll write the tests later” and the tests end up being generic or insufficient. My approach: each acceptance criterion generates at least one test.
For NOTIF-2 (the decision service), the tests would be:
class NotificationDecisionServiceTest {
private val eventConsumer = NotificationDecisionService(
notificationRepository = mockk(),
userPreferencesRepository = mockk(),
eventPublisher = mockk()
)
@Test
fun `should create pending notification when user has email enabled`() {
// Given
val event = OrderStatusChangedEvent(
orderId = 1L,
userId = 42L,
previousStatus = OrderStatus.CONFIRMED,
newStatus = OrderStatus.SHIPPED,
timestamp = Instant.now()
)
every { userPreferencesRepository.findByUserId(42L) } returns
UserPreferences(userId = 42L, emailNotificationsEnabled = true)
every { notificationRepository.save(any()) } answers { firstArg() }
// When
eventConsumer.handleOrderStatusChanged(event)
// Then
verify {
notificationRepository.save(match { notification ->
notification.userId == 42L &&
notification.orderId == 1L &&
notification.status == NotificationStatus.PENDING &&
notification.type == "ORDER_SHIPPED"
})
}
}
@Test
fun `should skip notification when user has email disabled`() {
// Given
val event = OrderStatusChangedEvent(
orderId = 1L,
userId = 42L,
previousStatus = OrderStatus.CONFIRMED,
newStatus = OrderStatus.SHIPPED,
timestamp = Instant.now()
)
every { userPreferencesRepository.findByUserId(42L) } returns
UserPreferences(userId = 42L, emailNotificationsEnabled = false)
// When
eventConsumer.handleOrderStatusChanged(event)
// Then
verify(exactly = 0) { notificationRepository.save(any()) }
}
@Test
fun `should log discarded notification when user preferences disabled`() {
// Given
val event = orderStatusChangedEvent(userId = 42L, newStatus = OrderStatus.SHIPPED)
every { userPreferencesRepository.findByUserId(42L) } returns
UserPreferences(userId = 42L, emailNotificationsEnabled = false)
// When
eventConsumer.handleOrderStatusChanged(event)
// Then
verify { logger.info(match { it.contains("discarded") && it.contains("42") }) }
}
}Notice the direct correspondence:
| Acceptance criterion | Test |
|---|---|
| ”verifies that the user has notifications enabled” | should create pending notification when user has email enabled |
| ”if the user has notifications disabled, it’s discarded” | should skip notification when user has email disabled |
| ”the discard is logged” | should log discarded notification when user preferences disabled |
Each criterion has its test. There’s no ambiguity about what gets verified or how.
Step 4: Prompts for AI agents
When I have tickets with acceptance criteria and defined tests, preparing a prompt for an agent is almost mechanical. The trick is giving it all the necessary information without noise.
Prompt structure I use:
## Context
Project: [name] - Spring Boot 3.x with Kotlin
Skills loaded: backend-conventions.md, testing-patterns.md
## Ticket: NOTIF-3 - Email sending service with templates
### Description
Implement the service that takes a notification in PENDING status, selects
the corresponding email template, renders the content, and sends it through
the configured email provider.
### Acceptance criteria
1. A template exists for each status: CONFIRMED, SHIPPED, DELIVERED, CANCELLED
2. Each template includes: user name, order number, status, link
3. The email is sent to the user's registered address
4. If sending succeeds: notification moves to SENT
5. If sending fails: notification moves to FAILED with error recorded
### Existing interfaces
- `NotificationRepository`: already implemented in NOTIF-2
- `EmailProvider`: interface to implement (abstraction over the provider)
- `Notification`: entity with fields orderId, userId, type, status, attempts
### Expected tests
- Unit test: correct rendering of each template
- Unit test: status update to SENT on success
- Unit test: status update to FAILED on error
- Integration test: real sending with mock provider (Testcontainers not applicable here)
### Constraints
- Don't use string concatenation for templates. Use Thymeleaf or similar.
- EmailProvider must be an interface (don't couple to a specific provider).
- User's personal data is not logged in plain text.The key to the prompt isn’t being long. It’s being precise. The agent needs to know: what to build, what criteria to meet, what interfaces exist, what tests to write, and what constraints to respect. With that and the project’s skills, the result is usually quite good.
Common mistakes when decomposing specs
After doing this many times, these are the mistakes I’ve seen (and made) the most:
1. Tickets that are too large
“Implement the notification system” is not a ticket. It’s an epic. If a ticket can’t be completed in 1-3 days, it needs more decomposition. A large ticket generates enormous PRs, superficial code reviews, and a false sense of progress.
2. Tickets without verifiable acceptance criteria
“The system should work correctly” is not an acceptance criterion. “When an order changes to SHIPPED status, an email is sent with the order number and tracking link to the user’s registered email” is. The difference is that the second one can be turned into an automated test.
3. Ignoring dependencies between tickets
If NOTIF-3 needs NOTIF-2 to be complete to function, that needs to be explicit. Otherwise, someone starts NOTIF-3 without having the interface they need and wastes half a day figuring out what’s missing.
4. Not separating infrastructure from business logic
Mixing “configure the email service” with “implement the notification decision logic” in the same ticket is a recipe for disaster. They’re different responsibilities, requiring different expertise, and probably done by different people (or agents).
5. Forgetting the observability ticket
There’s always a ticket for logs, metrics, and alerts. Always. If it’s not explicit, it doesn’t get done. And when something fails in production, you regret not having included it.
6. Ambiguous acceptance criteria that invite interpretation
“The email is sent quickly” is ambiguous. Quickly for whom, under what conditions, measured how. “The latency between the status change and email delivery is under 5 minutes at the 95th percentile” is verifiable. If you can’t write a test for the criterion, the criterion isn’t well written.
The full workflow in one picture
SPEC (approved)
│
├── Identify functional blocks
│
├── Break down into tickets
│ ├── Concrete description
│ ├── Verifiable acceptance criteria
│ └── Explicit dependencies
│
├── Derive tests from criteria
│ ├── One test per criterion (minimum)
│ ├── Unit tests for logic
│ └── Integration tests for flow
│
└── Prepare prompts for agents
├── Context + skills
├── Ticket + criteria
├── Existing interfaces
├── Expected tests
└── ConstraintsEach step feeds the next. The spec defines the requirements, tickets make them executable, criteria make them verifiable, tests make them automatable, and prompts make them delegable to an agent.
It’s not a bureaucratic process. It’s a process that reduces ambiguity at each step. And in a world where part of your team consists of AI agents that interpret instructions literally, reducing ambiguity is the difference between a well-implemented feature and a technical disaster.
What changed when I started doing this right
Before applying this workflow, sprints ended with reopened tickets, PRs that needed three rounds of review, and a constant feeling that “the spec said one thing but something else got implemented.”
Now the workflow is more predictable. Not perfect, but predictable. Tickets get closed on the first try more often. Agents generate code that passes the acceptance criteria because the criteria are written in a way that a literal system can verify. And tests exist before the code exists, not after.
It’s not magic. It’s structure. And structure works regardless of whether the one executing the ticket is a person or an agent.

