Spec-Driven Development in Practice: My Experience with spec-kit

Introduction

Over the past two years, AI coding tools have become ubiquitous. Whether it’s GitHub Copilot, Claude Code, or various coding agents, “say what you want and it generates the code” is now a common workflow. While these tools are convenient, the process still leaves room for improvement. The most obvious issue is that AI often “forgets” prior context. You might agree on a design direction, and mid-conversation it veers off and produces very different code. Code quality can also be inconsistent—what looks fine at a glance may hide security issues or logic errors. Several studies have highlighted these problems as well [1], [2]。

Even as models improve, human verification remains essential. Developers often spend significant time reviewing and fixing AI output — sometimes it’s faster to just rewrite it yourself. Whether you use prompt engineering or context engineering to stabilize collaboration, the key is expressing requirements clearly and establishing a structured workflow so coding agents deliver more reliably.

Recently I started using spec-kit, a toolkit built around “spec-driven development.” With /specify, you turn fuzzy ideas into structured specifications. With /plan, you translate specs into actionable implementation plans. After using it for a while, I’ve found spec-kit does improve the quality of collaboration with AI. In this post, I’ll share my experience and how this tool helped me build a more deliberate and efficient development flow.

What Is spec-kit?

spec-kit is an open-source toolkit from GitHub built around Specification-Driven Development (SDD). Unlike traditional workflows, it treats specs as the single source of truth—code serves the spec, not the other way around.

spec-kit offers three core commands for a structured development flow:

/specify: Converts a rough feature description into a structured specification. It automatically creates a branch and documents so the requirements are fully recorded. Specs include project purpose, user experience narratives, and detailed functional descriptions.
/plan: Generates an implementation plan from the specification. It translates business needs into technical architecture, defines the tech stack and constraints, and checks alignment with project principles.
/tasks: Breaks the plan into concrete, executable tasks. Tasks can be processed in parallel and include structured task docs so teams can move forward step by step.

Traditional specs tend to be static documents that drift from reality over time. spec-kit’s specs are living documents that guide the work throughout. They’re not just records—they also serve as direct inputs to AI coding agents, ensuring generated code aligns with the original requirements. When requirements change, you update the spec, regenerate the plan and code, and avoid expensive refactors. Clear specs also reduce communication overhead and ambiguity—especially important when collaborating with AI tools.

In GitHub’s post, Spec-driven development with AI: Get started with a new open source toolkit they note spec-kit shines in:

New projects: Greenfield work benefits most from specs-first
Feature expansion: Keeps new functionality aligned with existing architecture
Modernization: Establish specs first, then rework legacy systems

The design philosophy is to let developers focus on “what to build” while AI handles much of the “how,” improving both development velocity and code quality.

Use Case: Telegram Bookmark Bot

To show how this works in practice, here’s a recent project. I wanted to consolidate bookmarks scattered across services — Twitter bookmarks, Safari favorites, links shared in chats—into a self-hosted Linkding instance with consistent tags. Instead of jumping straight into architecture decisions, spec-kit let me clarify the requirements first.

Clarify Requirements with `/specify`

I started by using /specify to capture the core needs:

/specify

Build a Telegram bot that allows me to save bookmarks into a self-hosted Linkding instance. When I send the /bookmark command in Telegram with a URL and tags, the bot should store the URL into Linkding including the specified tags. The bot must validate tags against a predefined tagging system to avoid uncontrolled tag growth. This tagging system is maintained in the codebase, and any updates require redeployment. The purpose of this tool is to unify my saved web content from sources like Twitter bookmarks and Safari into Linkding with consistent tagging.

The spec deliberately states “what” instead of “how.” I emphasize tag validation because uncontrolled tag growth makes a bookmark system hard to manage. I also clarify the end goal: unified bookmark management across sources.

From a few conversational lines, spec-kit generates a full product spec. Here are some results that show its value:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
**User Scenarios & Testing**:

"A user discovers an interesting article while browsing Twitter or Safari. Instead of saving it in multiple different bookmark systems, they open Telegram, send a `/bookmark` command with the URL and tags from their predefined tagging system, and the bot automatically saves the bookmark to their centralized Linkding instance with validated, consistent tags."

**Primary User Story**:

- **Given** user has access to the Telegram bot and valid predefined tags, **When** user sends `/bookmark https://example.com tech programming`, **Then** the system validates tags against the predefined list and saves the URL to Linkding with approved tags
- **Given** user sends a bookmark command with invalid tags, **When** the command includes tags not in the predefined system, **Then** the system rejects the bookmark and suggests valid alternative tags

**Functional Requirements**:

- **FR-002**: System MUST validate all provided tags against a predefined tagging system stored in the codebase
- **FR-008**: System MUST provide specific error messages for tag validation failures with suggested alternatives
- **FR-012**: System MUST prevent uncontrolled tag growth by only accepting predefined tags

Create a Technical Plan with `/plan`

Based on the spec, I then used /plan to generate a concrete implementation plan:

/plan

The application will be implemented as a Cloudflare Worker running a grammY Telegram bot. The bot listens for the /bookmark command, extracts the URL and tags, and validates the tags against a local JSON/YAML configuration file that defines the tagging system. If validation passes, the bot calls the Linkding REST API to create a bookmark with the given URL and tags. If validation fails, the bot replies with an error message listing the valid tags. Additionally, the bot provides a /tags command that returns the current predefined tag list directly from the configuration file, helping users know which tags are available. The project will be deployed to Cloudflare Workers using GitHub Actions for automated CI/CD. The tagging configuration file is stored in the codebase, meaning that any changes require committing and redeploying the worker.

Here are some plan highlights auto-generated from the spec:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Implementation Plan: Telegram Bookmark Bot with Predefined Tagging

## Summary

Telegram bot for saving bookmarks to self-hosted Linkding instance with controlled tagging system.
Bot validates tags against predefined list, prevents tag sprawl, and integrates with Linkding API.
Implemented as Cloudflare Worker using grammY framework.

## Technical Context

Language/Version: TypeScript/Node.js 18+ (Cloudflare Worker runtime)
Primary Dependencies: grammY Telegram bot framework, Cloudflare Workers API, Linkding REST API client
Performance Goals: <200ms message processing, <500ms API responses
Scale/Scope: Personal use (single user), ~1000 bookmarks/month, 2 commands

## Constitution Check

✅ Code Quality & Standards: TypeScript, ESLint/Prettier, modular design
✅ Testing-First Development: Contract tests, integration tests, 80% coverage target
✅ User Experience Consistency: Standardized commands, consistent error messaging
✅ Performance Requirements: <200ms processing, <100MB memory usage

## Phase 1: Design & Contracts

- Extract entities from feature spec → data-model.md
- Generate API contracts from functional requirements
- Generate contract tests, Extract test scenarios from user stories

## Phase 2: Task Planning Approach

Task Categories: Setup, Configuration, Models, API Integration, Bot Commands, Testing, Deployment
Estimated Output: 20-25 numbered, ordered tasks in tasks.md

This example shows how spec-kit helps you move from a fuzzy idea to a clear implementation plan. In /specify, I focus on user scenarios and functional needs, without getting into tech details. In /plan, I start mapping those needs to a concrete tech stack. That separation keeps the process organized and avoids prematurely locking into tools when the requirements are still fluid.

Next come /tasks and /implement, which make the rest highly automatable. /tasks reads the plan and related docs, then breaks the design into an executable task list. Tasks that can run in parallel are flagged with [P], so teams can move multiple modules forward at once.

For this Telegram Bookmark Bot, /tasks generated around 32 structured tasks—covering environment setup, models, API integration, command implementations, tests, and deployment. Each task includes clear inputs, outputs, and acceptance criteria, keeping the work on-spec. spec-kit promotes a “testing-first” approach: even at the /tasks stage, it outlines contract and integration tests so every feature has the right validation in place.

With the tasks in hand, you can use /implement (or your agent of choice) to build things step by step. Because each task is backed by explicit specs and test expectations, AI-generated code quality goes up, and the amount of manual fixing drops.

Benefits of spec-kit

The biggest value I’ve found is enforced separation of concerns. In traditional workflows, we often mix “what problem are we solving?” with “how do we implement it?” That blend leads to over-focusing on technical details too early—or choosing only the tools we already know and missing better options. With spec-kit, /specify keeps me focused on the problem and the expected outcome. That constraint actually frees up creativity: I think more clearly about user needs and what the system is supposed to accomplish.

What impressed me most is how a few short lines can expand into a complete spec with user stories, acceptance criteria, and edge cases. It’s not just formatting — it’s a thinking aid. For example, I only mentioned “tag validation,” but spec-kit produced cases like “what if a tag has a typo?” or “what if the API fails?” These are details I often miss when writing specs by hand, yet they matter a lot for system robustness.

spec-kit also proactively pauses for clarification using [NEEDS CLARIFICATION] markers, rather than pushing ahead on assumptions. That’s particularly important because many AI coding tools will make up their own assumptions when things are ambiguous, leading to outcomes that diverge from what you want.

For instance, when I wrote “tag validation,” spec-kit flagged: [NEEDS CLARIFICATION: How should the system handle partial tag matches or typos?], forcing me to consider edge behavior. This interactive clarification ensures completeness and reduces the risk of discovering vague requirements late in implementation—saving both time and cost.

For AI collaboration specifically, spec-kit addresses the biggest pain: contextual consistency. As conversations grow, AI “forgets” design decisions. Specs act like a contract with the AI, keeping generated code aligned with the original intent. Even better, concrete functional requirements (e.g., “FR-002: System MUST validate all provided tags”) produce more precise code than vague natural language. When requirements change, I update the spec and rerun /plan to get an updated technical plan that stays consistent with the existing architecture.

After using spec-kit for a while, I’ve noticed my own analysis gets more systematic. Even without the tool, I now naturally think in terms of user stories, edge cases, and acceptance criteria. That habit applies beyond software—it helps with everyday problem analysis and solution design. In that sense, spec-kit isn’t just a tool; it’s a mental workout.

Challenges and Limitations

Despite the benefits, there are trade-offs. The most obvious friction is handling changes. By design, requirement changes go back through the full /specify → /plan → /tasks → /implement loop. For small adjustments (e.g., adding a single API endpoint or tweaking behavior), that can feel heavy. My current approach is to rerun the full spec-kit flow to add features. While iterating within the existing architecture usually needs fewer manual checks, “starting over” isn’t always the most agile way to handle small changes.

And although code quality improves overall, there are still parts that need manual verification. For example, in my first pass of the Telegram Bookmark Bot, I explicitly selected grammY in the spec, but the initial implementation didn’t use it consistently, and some functionality didn’t work as expected or couldn’t be auto-corrected. That said, compared to my earlier agent-only attempts, the first round after spec-kit usually needs just a couple of fixes to run correctly—overall efficiency is still much better.

This article, GitHub Spec Kit Experiment: ‘A Lot of Questions’, also points out spec-kit is still early. The active issues and discussions in the GitHub community show lots of feature requests and room to grow. That explains occasional surprises in practice, but also makes me optimistic about where it’s headed.

Conclusion

From hands-on use, I think spec-kit represents an important direction for AI-assisted development. It’s still early, but the spec-driven development approach clearly improves collaboration with AI tools. I’m bullish on this category and expect more tools like it to help developers work more effectively with AI.

In practice, spec-kit helps most with new projects and feature expansion—especially when you’re establishing a complete spec from scratch. For tiny tweaks or bug fixes, traditional approaches may be more direct. Overall, the biggest value is the enforced structure: even without the tool, the analysis framework is worth adopting.

I’ve primarily used spec-kit with Claude Code so far; other combinations remain to be tried. According to the docs, spec-kit integrates with many mainstream development tools, so applicability is quite high. One community workflow in GitHub discussions is particularly helpful for understanding how spec-kit fits together. If you’re curious, I recommend reading the spec-kit overview, but more importantly, build a small project end-to-end — you’ll grasp the tool and its value much faster.

GitHub SpecKit Workflow
Source: GitHub Discussion

Looking ahead, I’d love to see lighter-weight, incremental updates without running the full flow each time. Most importantly, spec-kit has already helped me develop a better analysis framework—thinking through edge cases and structuring real-world requirements. That shift in thinking may be an even more valuable outcome than the tool itself.

Photo by Brett Jordan on Unsplash