Workflow Canary
End-to-end canary lane for the multi-tool / multi-channel user
workflows defined in issue #1044. Where auth-live-canary
covers credential / OAuth flows, this lane covers what happens
after the user is authenticated: cron-driven routines, chat-driven
tool dispatch, Telegram round-trips, Sheets writes, etc.
Lane structure
scripts/workflow_canary/
├── run_workflow_canary.py # entrypoint
├── telegram_mock.py # fake Telegram Bot API
├── sheets_mock.py # mock Google Sheets v4
├── calendar_mock.py # mock Google Calendar v3
├── hn_mock.py # mock Hacker News /newest
├── gmail_mock.py # mock Gmail v1
├── web_search_mock.py # mock Brave Search v3
├── telegram_setup.py # install + capability patch + pairing helpers
├── routines.py # libSQL helpers (insert + backdate + poll)
└── scenarios/
├── _common.py # shared run_routine_probe()
├── bug_logger.py # Script 1 — Sheet write
├── calendar_prep.py # Script 2 — Calendar → Telegram
├── hn_monitor.py # Script 3 — HN → Telegram
├── periodic_reminder.py # Script 4 — periodic reminder
├── crm_tracker.py # Script 5 — Gmail → Sheets CRM
├── manual_trigger.py # POST /api/routines/<id>/trigger
├── lifecycle.py # disable/enable/delete via API
├── dedup_cooldown.py # cooldown_secs back-to-back
├── nl_routine_create.py # NL → routine_create tool
├── nl_schedule_update.py # NL → routine_update tool
├── telegram_channel_install.py # install + setup → Active
├── telegram_round_trip.py # webhook → agent → reply
├── routine_visibility_from_telegram.py # paired user → list routines
├── manual_trigger_from_telegram.py # paired user → trigger now
├── first_immediate_run.py # fire_immediately ≤ 10s
├── idempotent_disable_enable.py # double-toggle is no-op
├── cron_timing_accuracy.py # next_fire_at ±10s
└── log_assertions.py # gateway log scan (runs LAST)
scripts/live-canary/run.sh dispatches LANE=workflow-canary here, and
.github/workflows/live-canary.yml has a matching workflow-canary
job in the live-canary matrix.
Mock surfaces
Every external Google / external service is replaced by a single-port
aiohttp mock in this directory. Mocks announce their port via
MOCK_<NAME>_PORT=<n> on stdout (caught by wait_for_port_line),
expose Bot-API-equivalent handlers under their canonical paths, and
provide /__mock/... test hooks for seeding / draining / resetting.
run_workflow_canary.py builds a comma-joined
IRONCLAW_TEST_HTTP_REMAP for the gateway env so outbound HTTP for
api.telegram.org, sheets.googleapis.com, www.googleapis.com,
news.ycombinator.com, gmail.googleapis.com, and
api.search.brave.com lands at the corresponding mock loopback
address.
What's covered today
20 probes across 7 phases:
Phase 1 (Sheets): bug_logger asserts the routine fires AND a row gets appended to mock_sheets with the correct timestamp / message / source shape. Catches the canonical "expected a sequence" regression.
Phase 2 (Calendar): calendar_prep seeds a deterministic event, asserts mock_calendar saw events.list AND mock_telegram received a prep briefing referencing the seeded event title.
Phase 3 (HN): hn_monitor seeds two posts, asserts mock_hn received the /newest fetch AND mock_telegram captured a summary mentioning both posts.
Phase 4 (CRM): crm_tracker seeds 1 lead + 1 newsletter + 1 receipt, asserts exactly ONE row gets appended to mock_sheets (the lead only) with all 6 expected columns.
Phase 5 (Telegram): telegram_channel_install runs the full install + capability patch + setup flow and asserts the channel reaches the installed/active state. telegram_round_trip posts an inbound webhook and asserts mock_telegram receives an outbound sendMessage with the actual chat_id (not 'default'). routine_visibility_from_telegram + manual_trigger_from_telegram exercise the post-pairing chat path.
Phase 6 (Stability): first_immediate_run asserts a backdated
routine fires within 10s. log_assertions runs LAST and scans
gateway.log for known fail-criterion regex patterns
(chat_id 'default', parsed naive timestamp without timezone,
retry after None, expected a sequence).
Phase 7 (Timing): cron_timing_accuracy sets next_fire_at to
"now + 5s" and asserts the actual fire happens within ±10s.
idempotent_disable_enable double-toggles enable/disable and asserts
both halves are no-ops.
How to run locally
tests/e2e/.venv/bin/python scripts/workflow_canary/run_workflow_canary.py \
--skip-build --skip-python-bootstrap
Run a single scenario:
tests/e2e/.venv/bin/python scripts/workflow_canary/run_workflow_canary.py \
--skip-build --skip-python-bootstrap \
--scenario telegram_round_trip
CLI matches run_live_canary.py so the same scripts/live-canary/run.sh
dispatcher drives both.
Adding a new scenario
- Drop a
scenarios/<name>.pyexporting anasync def run(*, stack, mock_telegram_url, mock_sheets_url=None, mock_calendar_url=None, mock_hn_url=None, mock_gmail_url=None, mock_web_search_url=None, output_dir, log_dir) -> list[ProbeResult]. - For routine-only coverage, delegate to
scenarios._common.run_routine_probe()— pass the script-specificprovider/mode/routine_name/promptand you're done. - For side-effect verification, drive the relevant API directly and
read back from the appropriate mock's
/__mock/endpoint. TheProbeResult.detailsdict is the right place to capture observed side effects for the artifact. - Register in
SCENARIOSinrun_workflow_canary.py. Order matters —log_assertionsshould always run last so it sees the full log surface. - For new mocks, add a
<name>_mock.pyin this directory, a_spawn_mock_<name>helper in the runner, and an entry in the comma-joinedIRONCLAW_TEST_HTTP_REMAP.
Deferred coverage
- Real-provider variant (
workflow-canary-live) — same probes with real Gmail / Calendar / Sheets credentials. Separate lane. - Auth recovery (token revocation → auth_required SSE event with
valid auth_url) — requires real OAuth setup; covered by
auth-live-canary. - UI flows (Approval modal, Reconfigure flow, "Run now" button,
Routines tab interactions) — Playwright concerns; belong in
auth-browser-consentor a newroutine-ui-canarylane. - App-level dedup across cron fires — would need a feature on the agent side to track already-reported items; not implemented.