The bash test had ZERO output assertions — it just ran claude -p
and printed token usage. Drill's scenarios are strictly more
rigorous:
go-fractals: skill-called SDD + tool-called Agent + go test ./...
passes + cmd/fractals/main.go exists + >=4 commits + LLM criteria
verifying real SDD workflow.
svelte-todo: skill-called SDD + tool-called Agent + npm test passes
+ playwright e2e passes + package.json + svelte.config.js or
vite.config.ts + >=4 commits + LLM criteria.
design.md and plan.md are byte-identical between bash fixtures and
drill fixtures (evals/fixtures/sdd-{go-fractals,svelte-todo}/).
Drill's setup helper (scaffold_sdd_*) forces git init -b main
(stricter than bash's reliance on init.defaultBranch). The
.claude/settings.local.json from bash scaffold.sh is unnecessary
for drill since permissions are managed via backend YAML.
Subagent verification: SAFE TO DELETE for both.
Replace #!/bin/bash with #!/usr/bin/env bash in 13 scripts. The
hardcoded path fails on NixOS, FreeBSD, and macOS with Homebrew bash.
#!/usr/bin/env bash is the portable POSIX-friendly alternative.
Tested on Linux and Windows (Git Bash + CMD). macOS is the primary
beneficiary since Homebrew installs bash to /opt/homebrew/bin/bash.
Based on #700, closes#700.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude CLI now requires --verbose when using --output-format stream-json
with -p (print mode). Without it, the test fails with:
"Error: When using --print, --output-format=stream-json requires --verbose"
Testing revealed that skill descriptions summarizing workflow cause
Claude to follow the description instead of reading the skill body.
- A description saying "code review between tasks" caused ONE review
- The flowchart clearly showed TWO reviews (spec compliance + quality)
- Minimal description ("Use when...") correctly deferred to flowchart
Updated writing-skills with:
- "Description = When to Use, NOT What the Skill Does" section
- Cautionary tale about this actual failure
- Examples of good (triggers only) vs bad (workflow summary) descriptions
Updated subagent-driven-development:
- Removed workflow summary from description
- Now just: "Use when executing implementation plans..."
Updated test runner:
- Added --dangerously-skip-permissions for automated testing
Two test projects for validating the skill with full end-to-end runs:
- go-fractals: 10 tasks, CLI tool with Sierpinski and Mandelbrot
- svelte-todo: 12 tasks, CRUD app with localStorage and Playwright
Each test has design.md, plan.md, and scaffold.sh. Run with:
./tests/subagent-driven-dev/run-test.sh go-fractals