fix(canary,mcp,docs): address review findings + harden MCP registry isolation

Five reviewer-flagged issues, one review-discipline follow-up, plus three smaller doc/fixture hygiene fixes: Scrubber (Critical): scripts/live-canary/scrub-artifacts.sh only matched `access_token:` / `refresh_token=` text, not the JSON shapes the seeded + browser lanes actually emit. Added patterns + sed redactions for `"access_token": "…"`, `"refresh_token": "…"`, `"client_secret": "…"`, etc., so STRICT_ARTIFACT_SCRUB is a real last line of defense. Artifacts (Critical): removed artifacts/ from tracking (was carrying real live-provider output including a real user email + calendar data). Added artifacts/ to .gitignore so future local runs cannot re-introduce them. Gitignored tests/fixtures/llm_traces/live/*.log since those are local debug artifacts, not committed fixtures. MCP isolation (Concerning): the (user_id, server_name)-keyed client store fixed runtime dispatch but the ToolRegistry is still keyed by tool name only — a second user activating the same server_name with a different tool surface would silently shadow the first user's wrappers. Added `surface_signature()` in client_store + a `check_surface_conflict()` method that ExtensionManager calls before registering; divergent surfaces now return ActivationFailed with a clear message. Caller-level integration test `activate_rejects_divergent_tool_surface_on_shared_server_name` drives two mock MCP servers through the full ExtensionManager path. Scheduled seeded lane (Concerning): `configured_seeded_cases(None)` returned every seeded case — including the mutating lifecycle probes (gmail_roundtrip, google_calendar_lifecycle, notion_search_lifecycle) that write+delete real provider data. Split into read-only default (gmail, google_calendar, github, notion) vs opt-in lifecycle set; operators must now name lifecycle cases explicitly via --case / CASES= before mutation runs. Workflow environments (Concerning): ACCOUNTS.md documented that auth-live-seeded uses the `auth-live-canary` GitHub Environment and auth-browser-consent uses `auth-browser-canary`, but neither job declared `environment:`. Added the declarations so operators putting secrets at environment scope get them at runtime and inherit environment protection rules. Workflow schedule: moved the four formerly-PR-gating lanes (auth-smoke, auth-full, auth-channels, deterministic-replay) off `pull_request` triggers and onto hourly schedules staggered by minute offset, alongside the already-hourly auth-live-seeded plus the real-provider lanes. Docs fixes: - docs/extensions/github.md: step title was "Install the Web Search Extension" under the GitHub page; corrected, plus brand spelling `Github` → `GitHub` throughout this file and the zh translation. - tools-src/github/github-tool.capabilities.json: PAT instructions mentioned only `repo` scope; updated to match the OAuth scopes array (`repo, workflow, read:org`) + the README. Fixture hint relaxation: - tests/fixtures/llm_traces/live/zizmor_scan*.json: old recorded `last_user_message_contains` hint was the old URL-form prompt and did not match the new verb-form ZIZMOR_SCAN_PROMPT, producing noisy `[TraceLlm WARN] Request hint mismatch` lines on replay. Relaxed the hint substring to "zizmor" so both prompt phrasings (and any future rewording that keeps the tool name) match without re-recording the full live traces.
2026-06-18 00:14:30 +08:00 · 2026-04-21 18:39:35 -07:00
parent fb2f4fa9d6
commit 0df70e4002
38 changed files with 429 additions and 727 deletions
--- a/.github/workflows/live-canary.yml
+++ b/.github/workflows/live-canary.yml
@@ -4,18 +4,16 @@ on:
  # Each cron below is matched by `if: github.event.schedule == '<cron>'` on a
  # specific job. Keep this list in sync with the `if:` guards — an orphan cron
  # will fire with no work, and a new job needs its cron added here.
-  pull_request:
-    branches: [main, staging]
-    types: [opened, synchronize, reopened]
  schedule:
-    # Hourly smoke — must-green signal for auth + live-provider token paths.
-    - cron: "0 * * * *"    # → auth-smoke (mock-backed pytest matrix)
+    # Temporary: every lane runs hourly while we dial in coverage. Staggered
+    # across minute offsets so they don't all spike at :00. Revisit once
+    # signal is stable — provider-matrix + browser-consent lanes are
+    # expensive and were previously daily/weekly.
+    - cron: "0 * * * *"    # → auth-smoke + auth-full + auth-channels + deterministic-replay
    - cron: "15 * * * *"   # → auth-live-seeded (real Google/GitHub/Notion tokens)
-    # Nightly — broader coverage on real providers and public flows.
-    - cron: "0 3 * * *"    # → public-smoke + persona-rotating + private-oauth
-    - cron: "30 3 * * *"   # → auth-browser-consent (Playwright OAuth consent)
-    # Weekly — slowest, most expensive lane (full provider matrix).
-    - cron: "0 5 * * 0"    # → provider-matrix (Sundays, 05:00 UTC)
+    - cron: "30 * * * *"   # → public-smoke + persona-rotating + private-oauth
+    - cron: "45 * * * *"   # → auth-browser-consent (Playwright OAuth consent)
+    - cron: "50 * * * *"   # → provider-matrix (full provider lane)
  workflow_dispatch:
    inputs:
      lane:
@@ -73,8 +71,7 @@ jobs:
    name: Auth Smoke
    if: >
      (github.event_name == 'schedule' && github.event.schedule == '0 * * * *') ||
-      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-smoke')) ||
-      (github.event_name == 'pull_request')
+      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-smoke'))
    runs-on: ubuntu-latest
    timeout-minutes: 60
    env:
@@ -105,8 +102,8 @@ jobs:
  auth-full:
    name: Auth Full
    if: >
-      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-full')) ||
-      (github.event_name == 'pull_request')
+      (github.event_name == 'schedule' && github.event.schedule == '0 * * * *') ||
+      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-full'))
    runs-on: ubuntu-latest
    timeout-minutes: 75
    env:
@@ -137,8 +134,8 @@ jobs:
  auth-channels:
    name: Auth Channels
    if: >
-      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-channels')) ||
-      (github.event_name == 'pull_request')
+      (github.event_name == 'schedule' && github.event.schedule == '0 * * * *') ||
+      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-channels'))
    runs-on: ubuntu-latest
    timeout-minutes: 60
    env:
@@ -172,6 +169,14 @@ jobs:
      (github.event_name == 'schedule' && github.event.schedule == '15 * * * *') ||
      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-live-seeded'))
    runs-on: ubuntu-latest
+    # Scoped to the `auth-live-canary` GitHub Environment — matches the
+    # layout documented in `scripts/live-canary/ACCOUNTS.md`. Secrets
+    # for seeded live-provider tokens (AUTH_LIVE_* access / refresh
+    # tokens, GOOGLE_OAUTH_CLIENT_SECRET, GITHUB_OAUTH_CLIENT_SECRET,
+    # notion client secret) are stored at the environment scope so they
+    # carry environment protection rules (required reviewers, branch
+    # filters) and don't commingle with repo-wide secrets.
+    environment: auth-live-canary
    timeout-minutes: 75
    env:
      LANE: auth-live-seeded
@@ -259,9 +264,14 @@ jobs:
  auth-browser-consent:
    name: Auth Browser Consent
    if: >
-      (github.event_name == 'schedule' && github.event.schedule == '30 3 * * *') ||
+      (github.event_name == 'schedule' && github.event.schedule == '45 * * * *') ||
      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'auth-browser-consent'))
    runs-on: ubuntu-latest
+    # Scoped to the `auth-browser-canary` GitHub Environment — matches
+    # the layout documented in `scripts/live-canary/ACCOUNTS.md`.
+    # Playwright storage state secrets and provider browser credentials
+    # live at environment scope.
+    environment: auth-browser-canary
    timeout-minutes: 90
    env:
      LANE: auth-browser-consent
@@ -362,9 +372,9 @@ jobs:
  deterministic-replay:
    name: Deterministic Replay
    if: >
+      (github.event_name == 'schedule' && github.event.schedule == '0 * * * *') ||
      (github.event_name == 'workflow_dispatch' &&
-      (inputs.lane == 'all' || inputs.lane == 'deterministic-replay')) ||
-      (github.event_name == 'pull_request')
+      (inputs.lane == 'all' || inputs.lane == 'deterministic-replay'))
    runs-on: ubuntu-latest
    timeout-minutes: 90
    steps:
@@ -403,7 +413,7 @@ jobs:
  public-smoke:
    name: Public Live Smoke
    if: >
-      (github.event_name == 'schedule' && github.event.schedule == '0 3 * * *') ||
+      (github.event_name == 'schedule' && github.event.schedule == '30 * * * *') ||
      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'public-smoke'))
    runs-on: ubuntu-latest
    timeout-minutes: 120
@@ -464,7 +474,7 @@ jobs:
  persona-rotating:
    name: Rotating Persona Live
    if: >
-      (github.event_name == 'schedule' && github.event.schedule == '0 3 * * *') ||
+      (github.event_name == 'schedule' && github.event.schedule == '30 * * * *') ||
      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'persona-rotating'))
    runs-on: ubuntu-latest
    timeout-minutes: 180
@@ -524,7 +534,7 @@ jobs:
    name: Private OAuth Live
    if: >
      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'private-oauth')) ||
-      (github.event_name == 'schedule' && github.event.schedule == '0 3 * * *' && vars.LIVE_CANARY_PRIVATE_OAUTH_ENABLED == 'true')
+      (github.event_name == 'schedule' && github.event.schedule == '30 * * * *' && vars.LIVE_CANARY_PRIVATE_OAUTH_ENABLED == 'true')
    runs-on: [self-hosted, ironclaw-live]
    timeout-minutes: 120
    env:
@@ -577,7 +587,7 @@ jobs:
  provider-matrix:
    name: Provider Matrix (${{ matrix.provider }})
    if: >
-      (github.event_name == 'schedule' && github.event.schedule == '0 5 * * 0') ||
+      (github.event_name == 'schedule' && github.event.schedule == '50 * * * *') ||
      (github.event_name == 'workflow_dispatch' && (inputs.lane == 'all' || inputs.lane == 'provider-matrix'))
    runs-on: ubuntu-latest
    timeout-minutes: 120
--- a/.gitignore
+++ b/.gitignore
@@ -25,6 +25,10 @@ bench-results/
 # Coverage reports (local runs, not committed)
 /coverage/

+# Canary / E2E run outputs (per-run logs, screenshots, trace artifacts —
+# CI uploads these via actions/upload-artifact; never commit local copies)
+artifacts/
+
 # WASM build artifacts (loaded from disk, not bundled)
 *.wasm

@@ -44,3 +48,7 @@ __pycache__/
 engine_trace_*.json
 tests/fixtures/llm_traces/live/github_dev_workflow_full_loop.json
 tests/fixtures/llm_traces/live/github_dev_workflow_full_loop.log
+# Per-test live-replay logs — generated when running `--ignored` live
+# tests locally. Only the .json fixtures for each scenario are checked
+# in; the .log files are local debugging artifacts.
+tests/fixtures/llm_traces/live/*.log
--- a/artifacts/auth-live-canary/browser/google-browser-failure.png
+++ b/artifacts/auth-live-canary/browser/google-browser-failure.png
--- a/artifacts/auth-live-canary/browser/google-oauth-timeout.png
+++ b/artifacts/auth-live-canary/browser/google-oauth-timeout.png
--- a/artifacts/auth-live-canary/browser/results.json
+++ b/artifacts/auth-live-canary/browser/results.json
@@ -1,16 +0,0 @@
-{
-  "base_url": "http://127.0.0.1:55821",
-  "generated_at": "2026-04-20T12:26:55Z",
-  "results": [
-    {
-      "details": {
-        "error": "Timed out waiting for google OAuth callback page",
-        "screenshot": "/Users/nikolajpismenkov/Documents/near/ironclaw/artifacts/auth-live-canary/browser/google-browser-failure.png"
-      },
-      "latency_ms": 185343,
-      "mode": "browser_oauth",
-      "provider": "google",
-      "success": false
-    }
-  ]
-}
--- a/artifacts/auth-live-canary/seeded/results.json
+++ b/artifacts/auth-live-canary/seeded/results.json
--- a/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/env-summary.txt
+++ b/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/env-summary.txt
@@ -1,20 +0,0 @@
-lane=auth-live-seeded
-scenario=<default>
-provider=default
-started_at=2026-04-20T08:20:47Z
-sha=44301815860f5d1e30e732fc69c6e446cb4b9c3e
-branch=codex/auth-oauth-canary-unification
-rustc=rustc 1.92.0 (ded5c06cf 2025-12-08)
-cargo=cargo 1.92.0 (344c4567c 2025-10-21)
-IRONCLAW_LIVE_TEST=<unset>
-LLM_BACKEND=<unset>
-LLM_MODEL=<unset>
-ANTHROPIC_MODEL=<unset>
-OPENAI_MODEL=<unset>
-GEMINI_MODEL=<unset>
-DATABASE_BACKEND=<unset>
-LIBSQL_PATH=<unset>
-playwright_install=auto
-cases=gmail
-skip_build=0
-skip_python_bootstrap=0
--- a/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/summary.md
+++ b/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/summary.md
@@ -1,16 +0,0 @@
-## Live Canary Summary
-
-| Field | Value |
-| --- | --- |
-| Lane | `auth-live-seeded` |
-| Scenario | `<default>` |
-| Provider | `default` |
-| Status | `0` |
-| Started | `2026-04-20T08:20:47Z` |
-| Finished | `2026-04-20T08:20:47Z` |
-| Commit | `44301815860f5d1e30e732fc69c6e446cb4b9c3e` |
-
-Artifacts:
- `artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/test-output.log`
- `artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/env-summary.txt`
- `artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/trace-fixture-status.txt`
--- a/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/test-output.log
+++ b/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/test-output.log
@@ -1,4 +0,0 @@
-[live-canary] lane=auth-live-seeded scenario=<default> provider=default
-[live-canary] artifacts=artifacts/live-canary/auth-live-seeded/default/20260420T082047Z
-[live-canary] summary=artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/summary.md
-[live-canary] log=artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/test-output.log
--- a/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/trace-fixture-status.txt
+++ b/artifacts/live-canary/auth-live-seeded/default/20260420T082047Z/trace-fixture-status.txt
@@ -1 +0,0 @@
-No live trace fixture changes detected.
--- a/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/env-summary.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/env-summary.txt
@@ -1,20 +0,0 @@
-lane=deterministic-replay
-scenario=<default>
-provider=default
-started_at=2026-04-19T07:23:40Z
-sha=162160eb586a5e543982b09dc461f930dd9b1a34
-branch=codex/auth-oauth-canary-unification
-rustc=rustc 1.92.0 (ded5c06cf 2025-12-08)
-cargo=cargo 1.92.0 (344c4567c 2025-10-21)
-IRONCLAW_LIVE_TEST=<unset>
-LLM_BACKEND=<unset>
-LLM_MODEL=<unset>
-ANTHROPIC_MODEL=<unset>
-OPENAI_MODEL=<unset>
-GEMINI_MODEL=<unset>
-DATABASE_BACKEND=<unset>
-LIBSQL_PATH=<unset>
-playwright_install=auto
-cases=<default>
-skip_build=0
-skip_python_bootstrap=0
--- a/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/summary.md
+++ b/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/summary.md
@@ -1,16 +0,0 @@
-## Live Canary Summary
-
-| Field | Value |
-| --- | --- |
-| Lane | `deterministic-replay` |
-| Scenario | `<default>` |
-| Provider | `default` |
-| Status | `0` |
-| Started | `2026-04-19T07:23:40Z` |
-| Finished | `2026-04-19T07:24:11Z` |
-| Commit | `162160eb586a5e543982b09dc461f930dd9b1a34` |
-
-Artifacts:
- `artifacts/live-canary/deterministic-replay/default/20260419T072340Z/test-output.log`
- `artifacts/live-canary/deterministic-replay/default/20260419T072340Z/env-summary.txt`
- `artifacts/live-canary/deterministic-replay/default/20260419T072340Z/trace-fixture-status.txt`
--- a/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/test-output.log
+++ b/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/test-output.log
@@ -1,59 +0,0 @@
-[live-canary] lane=deterministic-replay scenario=<default> provider=default
-[live-canary] artifacts=artifacts/live-canary/deterministic-replay/default/20260419T072340Z
-[live-canary] running: cargo test --features libsql --test e2e_live -- --ignored --nocapture --test-threads=1
-   Compiling ironclaw v0.25.0 (/Users/nikolajpismenkov/Documents/near/ironclaw)
-    Finished `test` profile [unoptimized + debuginfo] target(s) in 21.94s
-     Running tests/e2e_live.rs (target/debug/deps/e2e_live-70f8324bb20e9782)
-
-running 4 tests
-test live_tests::drive_auth_gate_roundtrip ... [LiveTest] 'drive_auth_gate_roundtrip' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveAuthGate] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression covered by `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::drive_transparent_oauth_refresh ... [LiveTest] 'drive_transparent_oauth_refresh' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveRefresh] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression for the OAuth refresh layer lives in `auth::tests::*` and `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::zizmor_scan ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScan] Tools used: ["tool_search", "shell", "list_dir", "list_dir", "list_dir", "shell", "shell"]
-[ZizmorScan] Response preview: Done! I ran zizmor on your GitHub Actions workflows and found **security issues across all 14 workflow files**. Here's the summary:
-
-## 🔴 Critical Issues Found
-
-### 1. **Unpinned Action References** (Most Common)
-Every workflow uses actions like `actions/checkout@v6` instead of pinned SHA hashes. This is a supply chain risk - the action could change without notice.
-
-**Example fixes needed:**
-```yaml
-# Before
-uses: actions/checkout@v6
-
-# After  
-uses: actions/checkout@11bd71901bbe5b1630ceea73d275
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-ok
-test live_tests::zizmor_scan_v2 ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScanV2] Tools used: ["tool_search(zizmor)", "http(https://github.com/zizmorcore/zizmor)", "tool_search(zizmor github actions security)", "tool_list", "tool_search(zizmor)", "shell(pip install zizmor 2>&1 || echo \"pip install failed\")"]
-[ZizmorScanV2] Response preview: Error: Orchestrator error: effect execution error: Orchestrator error after resume: Traceback (most recent call last):
-  File "orchestrator.py", line 1183, in <module>
-  File "orchestrator.py", line 794, in run_loop
-RuntimeError: LLM call failed: LLM error: Provider live-zizmor_scan_v2 request failed: TraceLlm exhausted: served 4 call(s), no steps left
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-ok
-
-test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.01s
-
-[live-canary] summary=artifacts/live-canary/deterministic-replay/default/20260419T072340Z/summary.md
-[live-canary] log=artifacts/live-canary/deterministic-replay/default/20260419T072340Z/test-output.log
--- a/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/trace-fixture-status.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260419T072340Z/trace-fixture-status.txt
@@ -1 +0,0 @@
-No live trace fixture changes detected.
--- a/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/env-summary.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/env-summary.txt
@@ -1,20 +0,0 @@
-lane=deterministic-replay
-scenario=<default>
-provider=default
-started_at=2026-04-20T04:18:00Z
-sha=02690086232f4fbbb588620a0a49378a0977d89e
-branch=codex/auth-oauth-canary-unification
-rustc=rustc 1.92.0 (ded5c06cf 2025-12-08)
-cargo=cargo 1.92.0 (344c4567c 2025-10-21)
-IRONCLAW_LIVE_TEST=<unset>
-LLM_BACKEND=<unset>
-LLM_MODEL=<unset>
-ANTHROPIC_MODEL=<unset>
-OPENAI_MODEL=<unset>
-GEMINI_MODEL=<unset>
-DATABASE_BACKEND=<unset>
-LIBSQL_PATH=<unset>
-playwright_install=auto
-cases=<default>
-skip_build=0
-skip_python_bootstrap=0
--- a/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/summary.md
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/summary.md
@@ -1,16 +0,0 @@
-## Live Canary Summary
-
-| Field | Value |
-| --- | --- |
-| Lane | `deterministic-replay` |
-| Scenario | `<default>` |
-| Provider | `default` |
-| Status | `0` |
-| Started | `2026-04-20T04:18:00Z` |
-| Finished | `2026-04-20T04:20:18Z` |
-| Commit | `02690086232f4fbbb588620a0a49378a0977d89e` |
-
-Artifacts:
- `artifacts/live-canary/deterministic-replay/default/20260420T041800Z/test-output.log`
- `artifacts/live-canary/deterministic-replay/default/20260420T041800Z/env-summary.txt`
- `artifacts/live-canary/deterministic-replay/default/20260420T041800Z/trace-fixture-status.txt`
--- a/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/test-output.log
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/test-output.log
@@ -1,65 +0,0 @@
-[live-canary] lane=deterministic-replay scenario=<default> provider=default
-[live-canary] artifacts=artifacts/live-canary/deterministic-replay/default/20260420T041800Z
-[live-canary] running: cargo test --features libsql --test e2e_live -- --ignored --nocapture --test-threads=1
-   Compiling ironclaw_common v0.2.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_common)
-   Compiling ironclaw_skills v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_skills)
-   Compiling ironclaw v0.25.0 (/Users/nikolajpismenkov/Documents/near/ironclaw)
-   Compiling ironclaw_gateway v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_gateway)
-   Compiling insta v1.47.1
-   Compiling ironclaw_tui v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_tui)
-   Compiling ironclaw_engine v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_engine)
-    Finished `test` profile [unoptimized + debuginfo] target(s) in 2m 09s
-     Running tests/e2e_live.rs (target/debug/deps/e2e_live-6a5c5821f85166e6)
-
-running 4 tests
-test live_tests::drive_auth_gate_roundtrip ... [LiveTest] 'drive_auth_gate_roundtrip' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveAuthGate] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression covered by `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::drive_transparent_oauth_refresh ... [LiveTest] 'drive_transparent_oauth_refresh' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveRefresh] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression for the OAuth refresh layer lives in `auth::tests::*` and `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::zizmor_scan ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScan] Tools used: ["tool_search", "shell", "list_dir", "list_dir", "list_dir", "shell", "shell"]
-[ZizmorScan] Response preview: Done! I ran zizmor on your GitHub Actions workflows and found **security issues across all 14 workflow files**. Here's the summary:
-
-## 🔴 Critical Issues Found
-
-### 1. **Unpinned Action References** (Most Common)
-Every workflow uses actions like `actions/checkout@v6` instead of pinned SHA hashes. This is a supply chain risk - the action could change without notice.
-
-**Example fixes needed:**
-```yaml
-# Before
-uses: actions/checkout@v6
-
-# After  
-uses: actions/checkout@11bd71901bbe5b1630ceea73d275
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-ok
-test live_tests::zizmor_scan_v2 ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScanV2] Tools used: ["tool_search(zizmor)", "http(https://github.com/zizmorcore/zizmor)", "tool_search(zizmor github actions security)", "tool_list", "tool_search(zizmor)", "shell(pip install zizmor 2>&1 || echo \"pip install failed\")"]
-[ZizmorScanV2] Response preview: Error: Orchestrator error: effect execution error: Orchestrator error after resume: Traceback (most recent call last):
-  File "orchestrator.py", line 1183, in <module>
-  File "orchestrator.py", line 794, in run_loop
-RuntimeError: LLM call failed: LLM error: Provider live-zizmor_scan_v2 request failed: TraceLlm exhausted: served 4 call(s), no steps left
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-ok
-
-test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 6.56s
-
-[live-canary] summary=artifacts/live-canary/deterministic-replay/default/20260420T041800Z/summary.md
-[live-canary] log=artifacts/live-canary/deterministic-replay/default/20260420T041800Z/test-output.log
--- a/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/trace-fixture-status.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T041800Z/trace-fixture-status.txt
@@ -1 +0,0 @@
-No live trace fixture changes detected.
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/env-summary.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/env-summary.txt
@@ -1,20 +0,0 @@
-lane=deterministic-replay
-scenario=<default>
-provider=default
-started_at=2026-04-20T04:30:44Z
-sha=68c6bba80859024786faa7a29d4c394b418e3e55
-branch=codex/auth-oauth-canary-unification
-rustc=rustc 1.92.0 (ded5c06cf 2025-12-08)
-cargo=cargo 1.92.0 (344c4567c 2025-10-21)
-IRONCLAW_LIVE_TEST=<unset>
-LLM_BACKEND=<unset>
-LLM_MODEL=<unset>
-ANTHROPIC_MODEL=<unset>
-OPENAI_MODEL=<unset>
-GEMINI_MODEL=<unset>
-DATABASE_BACKEND=<unset>
-LIBSQL_PATH=<unset>
-playwright_install=auto
-cases=<default>
-skip_build=0
-skip_python_bootstrap=0
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/summary.md
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/summary.md
@@ -1,16 +0,0 @@
-## Live Canary Summary
-
-| Field | Value |
-| --- | --- |
-| Lane | `deterministic-replay` |
-| Scenario | `<default>` |
-| Provider | `default` |
-| Status | `0` |
-| Started | `2026-04-20T04:30:44Z` |
-| Finished | `2026-04-20T04:31:15Z` |
-| Commit | `68c6bba80859024786faa7a29d4c394b418e3e55` |
-
-Artifacts:
- `artifacts/live-canary/deterministic-replay/default/20260420T043044Z/test-output.log`
- `artifacts/live-canary/deterministic-replay/default/20260420T043044Z/env-summary.txt`
- `artifacts/live-canary/deterministic-replay/default/20260420T043044Z/trace-fixture-status.txt`
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/test-output.log
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/test-output.log
@@ -1,64 +0,0 @@
-[live-canary] lane=deterministic-replay scenario=<default> provider=default
-[live-canary] artifacts=artifacts/live-canary/deterministic-replay/default/20260420T043044Z
-[live-canary] running: cargo test --features libsql --test e2e_live -- --ignored --nocapture --test-threads=1
-   Compiling ironclaw_common v0.2.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_common)
-   Compiling ironclaw_skills v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_skills)
-   Compiling ironclaw v0.25.0 (/Users/nikolajpismenkov/Documents/near/ironclaw)
-   Compiling ironclaw_gateway v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_gateway)
-   Compiling ironclaw_tui v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_tui)
-   Compiling ironclaw_engine v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_engine)
-    Finished `test` profile [unoptimized + debuginfo] target(s) in 22.65s
-     Running tests/e2e_live.rs (target/debug/deps/e2e_live-6a5c5821f85166e6)
-
-running 4 tests
-test live_tests::drive_auth_gate_roundtrip ... [LiveTest] 'drive_auth_gate_roundtrip' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveAuthGate] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression covered by `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::drive_transparent_oauth_refresh ... [LiveTest] 'drive_transparent_oauth_refresh' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveRefresh] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression for the OAuth refresh layer lives in `auth::tests::*` and `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::zizmor_scan ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScan] Tools used: ["tool_search", "shell", "list_dir", "list_dir", "list_dir", "shell", "shell"]
-[ZizmorScan] Response preview: Done! I ran zizmor on your GitHub Actions workflows and found **security issues across all 14 workflow files**. Here's the summary:
-
-## 🔴 Critical Issues Found
-
-### 1. **Unpinned Action References** (Most Common)
-Every workflow uses actions like `actions/checkout@v6` instead of pinned SHA hashes. This is a supply chain risk - the action could change without notice.
-
-**Example fixes needed:**
-```yaml
-# Before
-uses: actions/checkout@v6
-
-# After  
-uses: actions/checkout@11bd71901bbe5b1630ceea73d275
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-ok
-test live_tests::zizmor_scan_v2 ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScanV2] Tools used: ["tool_search(zizmor)", "http(https://github.com/zizmorcore/zizmor)", "tool_search(zizmor github actions security)", "tool_list", "tool_search(zizmor)", "shell(pip install zizmor 2>&1 || echo \"pip install failed\")"]
-[ZizmorScanV2] Response preview: Error: Orchestrator error: effect execution error: Orchestrator error after resume: Traceback (most recent call last):
-  File "orchestrator.py", line 1183, in <module>
-  File "orchestrator.py", line 794, in run_loop
-RuntimeError: LLM call failed: LLM error: Provider live-zizmor_scan_v2 request failed: TraceLlm exhausted: served 4 call(s), no steps left
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-ok
-
-test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 6.45s
-
-[live-canary] summary=artifacts/live-canary/deterministic-replay/default/20260420T043044Z/summary.md
-[live-canary] log=artifacts/live-canary/deterministic-replay/default/20260420T043044Z/test-output.log
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/trace-fixture-status.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043044Z/trace-fixture-status.txt
@@ -1 +0,0 @@
-No live trace fixture changes detected.
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/env-summary.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/env-summary.txt
@@ -1,20 +0,0 @@
-lane=deterministic-replay
-scenario=<default>
-provider=default
-started_at=2026-04-20T04:37:14Z
-sha=2410ad2dd4d39a64371ae097f0c7813a0b1dfb33
-branch=codex/auth-oauth-canary-unification
-rustc=rustc 1.92.0 (ded5c06cf 2025-12-08)
-cargo=cargo 1.92.0 (344c4567c 2025-10-21)
-IRONCLAW_LIVE_TEST=<unset>
-LLM_BACKEND=<unset>
-LLM_MODEL=<unset>
-ANTHROPIC_MODEL=<unset>
-OPENAI_MODEL=<unset>
-GEMINI_MODEL=<unset>
-DATABASE_BACKEND=<unset>
-LIBSQL_PATH=<unset>
-playwright_install=auto
-cases=<default>
-skip_build=0
-skip_python_bootstrap=0
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/summary.md
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/summary.md
@@ -1,16 +0,0 @@
-## Live Canary Summary
-
-| Field | Value |
-| --- | --- |
-| Lane | `deterministic-replay` |
-| Scenario | `<default>` |
-| Provider | `default` |
-| Status | `0` |
-| Started | `2026-04-20T04:37:14Z` |
-| Finished | `2026-04-20T04:39:26Z` |
-| Commit | `2410ad2dd4d39a64371ae097f0c7813a0b1dfb33` |
-
-Artifacts:
- `artifacts/live-canary/deterministic-replay/default/20260420T043714Z/test-output.log`
- `artifacts/live-canary/deterministic-replay/default/20260420T043714Z/env-summary.txt`
- `artifacts/live-canary/deterministic-replay/default/20260420T043714Z/trace-fixture-status.txt`
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/test-output.log
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/test-output.log
@@ -1,64 +0,0 @@
-[live-canary] lane=deterministic-replay scenario=<default> provider=default
-[live-canary] artifacts=artifacts/live-canary/deterministic-replay/default/20260420T043714Z
-[live-canary] running: cargo test --features libsql --test e2e_live -- --ignored --nocapture --test-threads=1
-   Compiling ironclaw_common v0.2.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_common)
-   Compiling ironclaw_skills v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_skills)
-   Compiling ironclaw v0.25.0 (/Users/nikolajpismenkov/Documents/near/ironclaw)
-   Compiling ironclaw_gateway v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_gateway)
-   Compiling ironclaw_tui v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_tui)
-   Compiling ironclaw_engine v0.1.0 (/Users/nikolajpismenkov/Documents/near/ironclaw/crates/ironclaw_engine)
-    Finished `test` profile [unoptimized + debuginfo] target(s) in 2m 03s
-     Running tests/e2e_live.rs (target/debug/deps/e2e_live-6a5c5821f85166e6)
-
-running 4 tests
-test live_tests::drive_auth_gate_roundtrip ... [LiveTest] 'drive_auth_gate_roundtrip' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveAuthGate] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression covered by `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::drive_transparent_oauth_refresh ... [LiveTest] 'drive_transparent_oauth_refresh' has trace recording disabled and no replay fixture — skipping. Run with IRONCLAW_LIVE_TEST=1 to execute live.
-[DriveRefresh] Live-only test — skipping outside `IRONCLAW_LIVE_TEST=1`. Hermetic regression for the OAuth refresh layer lives in `auth::tests::*` and `test_auth_wasm_tool_finds_legacy_hyphen_alias`.
-ok
-test live_tests::zizmor_scan ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScan] Tools used: ["tool_search", "shell", "list_dir", "list_dir", "list_dir", "shell", "shell"]
-[ZizmorScan] Response preview: Done! I ran zizmor on your GitHub Actions workflows and found **security issues across all 14 workflow files**. Here's the summary:
-
-## 🔴 Critical Issues Found
-
-### 1. **Unpinned Action References** (Most Common)
-Every workflow uses actions like `actions/checkout@v6` instead of pinned SHA hashes. This is a supply chain risk - the action could change without notice.
-
-**Example fixes needed:**
-```yaml
-# Before
-uses: actions/checkout@v6
-
-# After  
-uses: actions/checkout@11bd71901bbe5b1630ceea73d275
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan.replay.log
-ok
-test live_tests::zizmor_scan_v2 ... [LiveTest] Mode: REPLAY — loading from /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.json
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[TraceLlm WARN] Request hint mismatch: expected last user message to contain "can we run https://github.com/zizmorcore/zizmor", got Some("Run zizmor against this checkout's GitHub Actions workflows now. Use the shell tool to install or invoke zizmor if needed, then execute it against `.github/workflows` and report the actual scan result. Do not stop after checking whether Rust, Cargo, Git, or zizmor are available. If the scan cannot run, include the exact command attempted and the exact failure output.")
-[ZizmorScanV2] Tools used: ["tool_search(zizmor)", "http(https://github.com/zizmorcore/zizmor)", "tool_search(zizmor github actions security)", "tool_list", "tool_search(zizmor)", "shell(pip install zizmor 2>&1 || echo \"pip install failed\")"]
-[ZizmorScanV2] Response preview: Error: Orchestrator error: effect execution error: Orchestrator error after resume: Traceback (most recent call last):
-  File "orchestrator.py", line 1204, in <module>
-  File "orchestrator.py", line 815, in run_loop
-RuntimeError: LLM call failed: LLM error: Provider live-zizmor_scan_v2 request failed: TraceLlm exhausted: served 4 call(s), no steps left
-[LiveTest] Session log: /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-[LiveTest] Diff: diff /Users/nikolajpismenkov/Documents/near/ironclaw/tests/fixtures/llm_traces/live/zizmor_scan_v2.log /var/folders/8b/3b8t3v4d7rj3v6pznzdkvjpr0000gn/T/ironclaw-live-tests/zizmor_scan_v2.replay.log
-ok
-
-test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 6.59s
-
-[live-canary] summary=artifacts/live-canary/deterministic-replay/default/20260420T043714Z/summary.md
-[live-canary] log=artifacts/live-canary/deterministic-replay/default/20260420T043714Z/test-output.log
--- a/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/trace-fixture-status.txt
+++ b/artifacts/live-canary/deterministic-replay/default/20260420T043714Z/trace-fixture-status.txt
@@ -1 +0,0 @@
-No live trace fixture changes detected.
--- a/docs/extensions/github.md
+++ b/docs/extensions/github.md
@@ -1,9 +1,9 @@
 ---
-title: "Github"
-description: "Let your agent access Github"
+title: "GitHub"
+description: "Let your agent access GitHub"
 ---

-The Github extension allows your agent to interact with Github repositories, issues, pull requests, and more, making it ideal for automating code-related tasks, managing projects, or gathering information from Github.
+The GitHub extension allows your agent to interact with GitHub repositories, issues, pull requests, and more, making it ideal for automating code-related tasks, managing projects, or gathering information from GitHub.

 ---

@@ -12,7 +12,7 @@ The Github extension allows your agent to interact with Github repositories, iss

 <Steps>

-<Step title="Install the Web Search Extension">
+<Step title="Install the GitHub Extension">

 To install the GitHub extension, run:

@@ -64,7 +64,7 @@ Be sure to create a fine-grained personal access token with only the necessary p

 ## Available Actions:

-Here are some of the actions your agent can perform with the Github extension:
+Here are some of the actions your agent can perform with the GitHub extension:

 - `get_repo`: Retrieve repository information  
 - `list_issues`: List all issues in a repository  
@@ -96,7 +96,7 @@ Lets configure our agent to have its own github account, which it can use to cre

 <Steps>

-<Step title="Create a new Github account">
+<Step title="Create a new GitHub account">

 Go to https://github.com and create a new account for your agent. If you are already logged in with your personal account you will need to briefly log out to create the new account, but you can log back in right after

@@ -104,12 +104,12 @@ Go to https://github.com and create a new account for your agent. If you are alr

 <Step title="Generate a Personal Access Token">

-On the agent's Github account, go to [Settings -> Developer settings -> Personal access tokens -> Tokens (classic)](https://github.com/settings/tokens) and generate a new token (classic) with the following permissions: `repo` -> `public_repo`
+On the agent's GitHub account, go to [Settings -> Developer settings -> Personal access tokens -> Tokens (classic)](https://github.com/settings/tokens) and generate a new token (classic) with the following permissions: `repo` -> `public_repo`

 </Step>

-<Step title="Authenticate the Github Extension">
-Now that you have either OAuth app credentials or a PAT, authenticate the Github extension:
+<Step title="Authenticate the GitHub Extension">
+Now that you have either OAuth app credentials or a PAT, authenticate the GitHub extension:

 ```bash
 ironclaw tool auth github
@@ -125,7 +125,7 @@ will use browser OAuth. Otherwise it falls back to prompting for a PAT.
 Ask your agent to create a test issue in one of your public repositories, and check if the issue was created successfully.

 <Tip>
-Ask your agent to read the [Github Markdown Guidelines](https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet) and remember then when creating issues and comments, it can make the formatting much nicer!
+Ask your agent to read the [GitHub Markdown Guidelines](https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet) and remember then when creating issues and comments, it can make the formatting much nicer!
 </Tip>

 </Step>
--- a/docs/zh/extensions/github.md
+++ b/docs/zh/extensions/github.md
@@ -1,10 +1,10 @@
 ---
-title: "Github"
-description: "让智能体访问 Github"
+title: "GitHub"
+description: "让智能体访问 GitHub"
 icon: github
 ---

-Github 扩展允许智能体与 Github 仓库、议题、拉取请求等交互，非常适合自动化代码相关任务、管理项目或从 Github 收集信息。
+GitHub 扩展允许智能体与 GitHub 仓库、议题、拉取请求等交互，非常适合自动化代码相关任务、管理项目或从 GitHub 收集信息。

 ---

@@ -14,14 +14,14 @@ Github 扩展允许智能体与 Github 仓库、议题、拉取请求等交互
 <Steps>

 <Step title="获取 API 密钥">
-要使用 Github 扩展，您需要从 Github 获取个人访问令牌。
+要使用 GitHub 扩展，您需要从 GitHub 获取个人访问令牌。


 </Step>

-<Step title="安装 Github 扩展">
+<Step title="安装 GitHub 扩展">

-在终端中运行以下命令安装 Github 扩展：
+在终端中运行以下命令安装 GitHub 扩展：

 ```bash
 ironclaw registry install github
@@ -31,7 +31,7 @@ ironclaw registry install github

 <Step title="配置 API 密钥">

-安装扩展后，需要在 IronClaw 中配置您的 Github API 密钥。运行：
+安装扩展后，需要在 IronClaw 中配置您的 GitHub API 密钥。运行：

 ```bash
 ironclaw tool auth github
@@ -51,7 +51,7 @@ ironclaw tool auth github

 ## 可用操作：

-以下是智能体使用 Github 扩展可以执行的一些操作：
+以下是智能体使用 GitHub 扩展可以执行的一些操作：

 - `get_repo`：获取仓库信息
 - `list_issues`：列出仓库中的所有议题
@@ -70,12 +70,12 @@ ironclaw tool auth github

 ## 在公共仓库上工作

-让我们为智能体配置自己的 Github 账户，以便它可以在**公共仓库**中创建议题和评论拉取请求。
+让我们为智能体配置自己的 GitHub 账户，以便它可以在**公共仓库**中创建议题和评论拉取请求。


 <Steps>

-<Step title="创建新的 Github 账户">
+<Step title="创建新的 GitHub 账户">

 前往 https://github.com 为智能体创建新账户。如果您已使用个人账户登录，需要暂时登出以创建新账户，之后可以立即重新登录。

@@ -83,12 +83,12 @@ ironclaw tool auth github

 <Step title="生成个人访问令牌">

-在智能体的 Github 账户上，前往 [Settings -> Developer settings -> Personal access tokens -> Tokens (classic)](https://github.com/settings/tokens) 并生成具有以下权限的新令牌（classic）：`repo` -> `public_repo`
+在智能体的 GitHub 账户上，前往 [Settings -> Developer settings -> Personal access tokens -> Tokens (classic)](https://github.com/settings/tokens) 并生成具有以下权限的新令牌（classic）：`repo` -> `public_repo`

 </Step>

-<Step title="认证 Github 扩展">
-获取令牌后，运行以下命令认证 Github 扩展：
+<Step title="认证 GitHub 扩展">
+获取令牌后，运行以下命令认证 GitHub 扩展：

 ```bash
 ironclaw tool auth github
@@ -103,7 +103,7 @@ ironclaw tool auth github
 让智能体在您的某个公共仓库中创建一个测试议题，检查议题是否创建成功。

 <Tip>
-让智能体阅读 [Github Markdown 指南](https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet) 并在创建议题和评论时记住这些格式规范，可以让格式更加美观！
+让智能体阅读 [GitHub Markdown 指南](https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet) 并在创建议题和评论时记住这些格式规范，可以让格式更加美观！
 </Tip>

 </Step>
--- a/scripts/auth_live_canary/run_live_canary.py
+++ b/scripts/auth_live_canary/run_live_canary.py
@@ -922,8 +922,10 @@ def parse_args() -> argparse.Namespace:
        action="append",
        help=(
            "Limit the run to selected providers. Repeat for multiple values. "
-            "For seeded mode: gmail, google_calendar, github, notion, "
-            "gmail_roundtrip, google_calendar_lifecycle, "
+            "For seeded mode, read-only cases (run by default when --case is "
+            "omitted): gmail, google_calendar, github, notion. "
+            "Mutating lifecycle cases — must be opted in explicitly, never "
+            "run by default: gmail_roundtrip, google_calendar_lifecycle, "
            "notion_search_lifecycle. "
            "For browser mode: google, github, notion."
        ),
--- a/scripts/live-canary/scrub-artifacts.sh
+++ b/scripts/live-canary/scrub-artifacts.sh
@@ -19,6 +19,12 @@ patterns=(
  'access[_-]?token[[:space:]]*[:=][[:space:]]*[^[:space:]]+'
  'refresh[_-]?token[[:space:]]*[:=][[:space:]]*[^[:space:]]+'
  'secret[[:space:]]*[:=][[:space:]]*[^[:space:]]+'
+  # JSON-quoted token shapes — the seeded/browser auth lanes emit results.json
+  # files containing full OAuth responses, which use `"access_token": "…"` /
+  # `"refresh_token": "…"` form. The `token:` / `token=` patterns above do
+  # not match those, so redaction would silently miss them.
+  '"(access|refresh|id|bearer)_token"[[:space:]]*:[[:space:]]*"[^"]+"'
+  '"(api[_-]?key|client[_-]?secret|password)"[[:space:]]*:[[:space:]]*"[^"]+"'
  'gh[pousr]_[A-Za-z0-9_]{20,}'
  'github_pat_[A-Za-z0-9_]{20,}'
  'ya29\.[A-Za-z0-9._-]{20,}'
@@ -42,7 +48,9 @@ redact_matches() {
    -e 's/(api[_-]?key[[:space:]]*[:=][[:space:]]*)[^[:space:]]+/\1<REDACTED>/Ig' \
    -e 's/(access[_-]?token[[:space:]]*[:=][[:space:]]*)[^[:space:]]+/\1<REDACTED>/Ig' \
    -e 's/(refresh[_-]?token[[:space:]]*[:=][[:space:]]*)[^[:space:]]+/\1<REDACTED>/Ig' \
-    -e 's/(secret[[:space:]]*[:=][[:space:]]*)[^[:space:]]+/\1<REDACTED>/Ig'
+    -e 's/(secret[[:space:]]*[:=][[:space:]]*)[^[:space:]]+/\1<REDACTED>/Ig' \
+    -e 's/("(access|refresh|id|bearer)_token"[[:space:]]*:[[:space:]]*)"[^"]+"/\1"<REDACTED>"/Ig' \
+    -e 's/("(api[_-]?key|client[_-]?secret|password)"[[:space:]]*:[[:space:]]*)"[^"]+"/\1"<REDACTED>"/Ig'
 }

 : > "${tmp_matches}"
--- a/scripts/live_canary/auth_registry.py
+++ b/scripts/live_canary/auth_registry.py
@@ -63,6 +63,22 @@ class BrowserProviderCase:
    auth_extension_name: str | None = None


+# Lifecycle ("write + cleanup") cases exercise real provider mutations —
+# they send emails, create calendar events, etc., and then clean up.
+# Even though each flow is self-cleaning, repeated hourly runs against
+# real accounts are not "low-risk / read-only" and must not be the
+# default selection for the scheduled lane. Callers must opt in by
+# naming these cases explicitly (e.g. `CASES=gmail_roundtrip` or
+# `--case gmail_roundtrip`) — see `configured_seeded_cases` below.
+LIFECYCLE_CASE_NAMES: frozenset[str] = frozenset(
+    {
+        "gmail_roundtrip",
+        "google_calendar_lifecycle",
+        "notion_search_lifecycle",
+    }
+)
+
+
 SEEDED_CASES: dict[str, SeededProviderCase] = {
    "gmail": SeededProviderCase(
        key="gmail",
@@ -184,7 +200,14 @@ def _canary_timestamp() -> str:

 def configured_seeded_cases(selected: list[str] | None) -> list[SeededProviderCase]:
    cases: list[SeededProviderCase] = []
-    names = selected or list(SEEDED_CASES)
+    # When no selection is provided (the scheduled-lane default path),
+    # exclude lifecycle/mutating cases. The scheduled lane must be
+    # low-risk/read-only unless an operator explicitly opts in by
+    # naming lifecycle cases via `--case` / `CASES=`.
+    if selected:
+        names = selected
+    else:
+        names = [n for n in SEEDED_CASES if n not in LIFECYCLE_CASE_NAMES]
    google_access = env_str("AUTH_LIVE_GOOGLE_ACCESS_TOKEN")
    google_refresh = env_str("AUTH_LIVE_GOOGLE_REFRESH_TOKEN")
    if google_refresh and not google_access:
--- a/src/extensions/manager.rs
+++ b/src/extensions/manager.rs
@@ -1213,8 +1213,40 @@ impl ExtensionManager {
        let lifecycle_lock = self.mcp_lifecycle_lock(&name).await;
        let _lifecycle_guard = lifecycle_lock.lock().await;

+        // Fingerprint the client's tool surface before registering so we
+        // can detect the case where an earlier injected client for the
+        // same `name` (but a different `user_id`) reported a different
+        // set of tools — the `ToolRegistry` is keyed by tool name only,
+        // so the later registration would silently shadow the earlier
+        // one and leak schemas across tenants. The second `list_tools`
+        // call inside `create_tools_with_store` hits the per-client
+        // cache, so fetching the list here doesn't cost a second round
+        // trip.
+        let surface_signature = match client.list_tools().await {
+            Ok(tools) => crate::tools::mcp::surface_signature(&tools),
+            Err(e) => {
+                tracing::warn!(
+                    error = %e,
+                    server = %name,
+                    "inject_mcp_client: list_tools failed; skipping registration"
+                );
+                return Vec::new();
+            }
+        };
+        if let Some(other) = self
+            .mcp_clients
+            .check_surface_conflict(user_id, &name, &surface_signature)
+            .await
+        {
+            tracing::warn!(
+                server = %name,
+                conflicting_user = %other,
+                "inject_mcp_client: tool surface differs from an already-active user on the same server name; refusing to inject to avoid cross-tenant schema shadowing"
+            );
+            return Vec::new();
+        }
        self.mcp_clients
-            .insert(user_id, &name, client.clone())
+            .insert(user_id, &name, client.clone(), surface_signature)
            .await;
        match client
            .create_tools_with_store(self.mcp_client_store())
@@ -5548,6 +5580,31 @@ impl ExtensionManager {
            .await
            .map_err(|e| ExtensionError::ActivationFailed(e.to_string()))?;

+        // Before registering any tool wrappers for this user, fingerprint
+        // the tool surface the server reported and reject activation if
+        // another user already has the same `name` active with a
+        // DIFFERENT surface. The `ToolRegistry` keys wrappers by tool
+        // name only, so without this check user B's incoming schemas
+        // would silently shadow user A's — one user's `list_tools()`
+        // result becomes the shared wrapper shape for every tenant.
+        // Reviewer call-out: the earlier (user_id, server_name)
+        // partitioning of the client store addressed the runtime
+        // dispatch leak, but the registry surface was still global and
+        // susceptible to the same cross-tenant leak.
+        let surface_signature = crate::tools::mcp::surface_signature(&mcp_tools);
+        if let Some(other_user) = self
+            .mcp_clients
+            .check_surface_conflict(user_id, name, &surface_signature)
+            .await
+        {
+            return Err(ExtensionError::ActivationFailed(format!(
+                "MCP server '{name}' is already active for another user with a different tool surface (conflicting user: {other_user}). \
+                 The global tool registry is keyed by tool name only, so activating a second client with a different schema would \
+                 shadow the existing user's wrappers. Either use a distinct server name (the user-facing identifier) per backend/account, \
+                 or coordinate so both users connect to a backend that returns an identical tool surface."
+            )));
+        }
+
        // Store the client for this user first, then register the
        // (user-agnostic) tool wrappers. The wrappers resolve the caller's
        // client at dispatch time from the shared `McpClientStore`, so the
@@ -5561,7 +5618,9 @@ impl ExtensionManager {
        // lock held at the top of this function keeps the cleanup safe
        // against concurrent `remove` / re-`activate` on the same server.
        let client = Arc::new(client);
-        self.mcp_clients.insert(user_id, name, client.clone()).await;
+        self.mcp_clients
+            .insert(user_id, name, client.clone(), surface_signature)
+            .await;

        let tool_impls = match client
            .create_tools_with_store(self.mcp_client_store())
--- a/src/tools/mcp/client_store.rs
+++ b/src/tools/mcp/client_store.rs
@@ -16,9 +16,47 @@
 use std::collections::HashMap;
 use std::sync::Arc;

+use sha2::{Digest, Sha256};
 use tokio::sync::RwLock;

 use super::client::McpClient;
+use super::protocol::McpTool;
+
+/// Compute a deterministic fingerprint of an MCP server's reported tool
+/// surface. Used by `McpClientStore::check_surface_conflict` to detect
+/// when two users activate the same `server_name` but the backend
+/// returns a different set of tools or different parameter schemas —
+/// the global `ToolRegistry` is keyed by tool name only, so the second
+/// activation's schemas would silently shadow the first and leak
+/// schema shape across tenants.
+///
+/// The fingerprint covers every field that surfaces to the LLM or the
+/// runtime: tool name, description, and the full input-schema JSON.
+/// Sorted by name so server-side ordering doesn't influence the hash.
+pub fn surface_signature(tools: &[McpTool]) -> String {
+    let mut entries: Vec<(String, String, String)> = tools
+        .iter()
+        .map(|t| {
+            (
+                t.name.clone(),
+                t.description.clone(),
+                serde_json::to_string(&t.input_schema).unwrap_or_default(),
+            )
+        })
+        .collect();
+    entries.sort_by(|a, b| a.0.cmp(&b.0));
+
+    let mut hasher = Sha256::new();
+    for (name, description, schema) in &entries {
+        hasher.update(name.as_bytes());
+        hasher.update(b"\x00");
+        hasher.update(description.as_bytes());
+        hasher.update(b"\x00");
+        hasher.update(schema.as_bytes());
+        hasher.update(b"\x01");
+    }
+    format!("{:x}", hasher.finalize())
+}

 /// Composite key identifying an MCP client instance: the authenticating
 /// user plus the server name. Both fields participate in `Hash` / `Eq` so
@@ -39,12 +77,22 @@ impl McpClientKey {
    }
 }

+/// Per-user MCP client entry: the active client plus the fingerprint
+/// of the tool surface it exposes. The signature is captured at
+/// activation time and is what `check_surface_conflict` compares
+/// across users.
+#[derive(Clone)]
+struct McpClientEntry {
+    client: Arc<McpClient>,
+    surface: String,
+}
+
 /// Per-user MCP client registry. Typically held as `Arc<McpClientStore>`
 /// by both `ExtensionManager` (for lifecycle) and every `McpToolWrapper`
 /// (for dispatch-time lookup).
 #[derive(Default)]
 pub struct McpClientStore {
-    clients: RwLock<HashMap<McpClientKey, Arc<McpClient>>>,
+    clients: RwLock<HashMap<McpClientKey, McpClientEntry>>,
 }

 impl McpClientStore {
@@ -52,13 +100,21 @@ impl McpClientStore {
        Self::default()
    }

-    /// Insert or replace the client for `(user_id, server_name)`. Replacing
+    /// Insert or replace the client for `(user_id, server_name)`. The
+    /// signature is the fingerprint of the tool surface this client
+    /// reported at activation time (see `surface_signature`). Replacing
    /// is only intended for the same user re-activating the same server.
-    pub async fn insert(&self, user_id: &str, server_name: &str, client: Arc<McpClient>) {
-        self.clients
-            .write()
-            .await
-            .insert(McpClientKey::new(user_id, server_name), client);
+    pub async fn insert(
+        &self,
+        user_id: &str,
+        server_name: &str,
+        client: Arc<McpClient>,
+        surface: String,
+    ) {
+        self.clients.write().await.insert(
+            McpClientKey::new(user_id, server_name),
+            McpClientEntry { client, surface },
+        );
    }

    /// Remove and return the client for `(user_id, server_name)`, if any.
@@ -67,6 +123,7 @@ impl McpClientStore {
            .write()
            .await
            .remove(&McpClientKey::new(user_id, server_name))
+            .map(|entry| entry.client)
    }

    /// Atomically remove `(user_id, server_name)` and report whether the
@@ -94,7 +151,7 @@ impl McpClientStore {
            .read()
            .await
            .get(&McpClientKey::new(user_id, server_name))
-            .cloned()
+            .map(|entry| entry.client.clone())
    }

    /// Whether `(user_id, server_name)` has an active client.
@@ -116,6 +173,37 @@ impl McpClientStore {
            .keys()
            .any(|key| key.server_name == server_name)
    }
+
+    /// Check whether the tool surface `incoming` — fingerprint of the
+    /// tools reported by the activating client — is compatible with any
+    /// OTHER user who already has `server_name` active.
+    ///
+    /// Returns `Some(other_user_id)` if a conflicting entry exists: a
+    /// different user has the same `server_name` active with a DIFFERENT
+    /// surface fingerprint. Same-user re-activations are ignored
+    /// because they're expected to replace the old entry.
+    ///
+    /// The `ToolRegistry` is keyed by tool name only, so two users on
+    /// the "same" server name with different URLs or different
+    /// credentials can produce different schemas. Without this check
+    /// the second user's registration would silently shadow the first's
+    /// — see the reviewer's concern that one user's `list_tools()`
+    /// result becomes the shared wrapper surface for everyone.
+    pub async fn check_surface_conflict(
+        &self,
+        user_id: &str,
+        server_name: &str,
+        incoming: &str,
+    ) -> Option<String> {
+        let clients = self.clients.read().await;
+        for (key, entry) in clients.iter() {
+            if key.server_name == server_name && key.user_id != user_id && entry.surface != incoming
+            {
+                return Some(key.user_id.clone());
+            }
+        }
+        None
+    }
 }

 #[cfg(test)]
@@ -129,8 +217,12 @@ mod tests {
        let client_a = Arc::new(McpClient::new_with_name("notion", "http://a.invalid"));
        let client_b = Arc::new(McpClient::new_with_name("notion", "http://b.invalid"));

-        store.insert("user-a", "notion", client_a.clone()).await;
-        store.insert("user-b", "notion", client_b.clone()).await;
+        store
+            .insert("user-a", "notion", client_a.clone(), "sig-a".into())
+            .await;
+        store
+            .insert("user-b", "notion", client_b.clone(), "sig-b".into())
+            .await;

        assert!(Arc::ptr_eq(
            &store.get("user-a", "notion").await.expect("a"),
@@ -148,8 +240,12 @@ mod tests {
        let client_a = Arc::new(McpClient::new_with_name("notion", "http://a.invalid"));
        let client_b = Arc::new(McpClient::new_with_name("notion", "http://b.invalid"));

-        store.insert("user-a", "notion", client_a).await;
-        store.insert("user-b", "notion", client_b).await;
+        store
+            .insert("user-a", "notion", client_a, "sig".into())
+            .await;
+        store
+            .insert("user-b", "notion", client_b, "sig".into())
+            .await;

        assert!(
            !store.remove_and_check_empty("user-a", "notion").await,
@@ -169,7 +265,7 @@ mod tests {
    async fn remove_and_check_empty_is_idempotent_on_missing_user() {
        let store = McpClientStore::new();
        let client = Arc::new(McpClient::new_with_name("notion", "http://a.invalid"));
-        store.insert("user-a", "notion", client).await;
+        store.insert("user-a", "notion", client, "sig".into()).await;

        assert!(
            !store
@@ -186,9 +282,11 @@ mod tests {
        let client = Arc::new(McpClient::new_with_name("notion", "http://a.invalid"));

        assert!(!store.any_active_for_server("notion").await);
-        store.insert("user-a", "notion", client.clone()).await;
+        store
+            .insert("user-a", "notion", client.clone(), "sig".into())
+            .await;
        assert!(store.any_active_for_server("notion").await);
-        store.insert("user-b", "notion", client).await;
+        store.insert("user-b", "notion", client, "sig".into()).await;

        assert!(store.remove("user-a", "notion").await.is_some());
        assert!(
@@ -198,4 +296,35 @@ mod tests {
        assert!(store.remove("user-b", "notion").await.is_some());
        assert!(!store.any_active_for_server("notion").await);
    }
+
+    #[tokio::test]
+    async fn check_surface_conflict_flags_divergent_surface_for_same_server() {
+        let store = McpClientStore::new();
+        let client = Arc::new(McpClient::new_with_name("notion", "http://a.invalid"));
+        store
+            .insert("user-a", "notion", client, "surface-v1".into())
+            .await;
+
+        assert_eq!(
+            store
+                .check_surface_conflict("user-b", "notion", "surface-v2")
+                .await,
+            Some("user-a".to_string()),
+            "user-b activating notion with a different surface than user-a must flag user-a as the conflict source",
+        );
+        assert!(
+            store
+                .check_surface_conflict("user-b", "notion", "surface-v1")
+                .await
+                .is_none(),
+            "identical surface fingerprint means no conflict — both users get the same wrapper shape",
+        );
+        assert!(
+            store
+                .check_surface_conflict("user-a", "notion", "surface-v2")
+                .await
+                .is_none(),
+            "same-user re-activation with a new surface is allowed (caller replaces their own entry)",
+        );
+    }
 }
--- a/src/tools/mcp/mod.rs
+++ b/src/tools/mcp/mod.rs
@@ -45,7 +45,7 @@ pub(crate) mod unix_transport;
 pub use auth::{is_authenticated, refresh_access_token};
 pub use client::McpClient;
 pub(crate) use client::mcp_tool_id;
-pub(crate) use client_store::McpClientStore;
+pub(crate) use client_store::{McpClientStore, surface_signature};
 pub use config::{McpServerConfig, McpServersFile, OAuthConfig};
 pub use factory::{McpFactoryError, create_client_from_config};
 pub use process::McpProcessManager;
--- a/tests/fixtures/llm_traces/live/zizmor_scan.json
+++ b/tests/fixtures/llm_traces/live/zizmor_scan.json
--- a/tests/fixtures/llm_traces/live/zizmor_scan_v2.json
+++ b/tests/fixtures/llm_traces/live/zizmor_scan_v2.json
--- a/tests/mcp_multi_tenant_integration.rs
+++ b/tests/mcp_multi_tenant_integration.rs
@@ -472,4 +472,106 @@ mod tests {

        mock_server.shutdown().await;
    }
+
+    /// Regression for the reviewer's concern that MCP tool registration
+    /// was still coarse-grained after the per-user client store landed.
+    /// The `ToolRegistry` is keyed by tool name only, so if user A
+    /// activates `SERVER_NAME` against backend X with one tool surface
+    /// and user B activates the same `SERVER_NAME` against backend Y
+    /// with a DIFFERENT surface, user B's `list_tools()` result would
+    /// silently shadow user A's in the global registry.
+    ///
+    /// The fix is to reject user B's activation when the surface
+    /// fingerprint disagrees with any other user's active entry for
+    /// the same `server_name`. This test drives `ExtensionManager`
+    /// activation end-to-end for both users and asserts:
+    /// - User A's activation succeeds.
+    /// - User B's activation fails with a clear ActivationFailed
+    ///   explaining the surface conflict.
+    /// - After the rejection, the registry still contains user A's
+    ///   wrappers (unshadowed), and user A can still dispatch.
+    #[tokio::test]
+    async fn activate_rejects_divergent_tool_surface_on_shared_server_name() {
+        let mock_server_a = start_mock_mcp_server(vec![MockToolResponse {
+            name: "mock_search".to_string(),
+            content: serde_json::json!({"ok": true}),
+        }])
+        .await;
+        let mock_server_b = start_mock_mcp_server(vec![MockToolResponse {
+            name: "different_tool".to_string(),
+            content: serde_json::json!({"ok": true}),
+        }])
+        .await;
+        let (db, _db_dir) = test_db().await;
+        let ext_dirs = tempfile::tempdir().expect("extension tempdir");
+        let secrets = test_secrets_store();
+        let tool_registry = Arc::new(ToolRegistry::new());
+        let manager = ExtensionManager::new(
+            Arc::new(McpSessionManager::new()),
+            Arc::new(McpProcessManager::new()),
+            Arc::clone(&secrets),
+            Arc::clone(&tool_registry),
+            None,
+            None,
+            ext_dirs.path().join("tools"),
+            ext_dirs.path().join("channels"),
+            None,
+            "owner".to_string(),
+            Some(db),
+            Vec::new(),
+        );
+        let server_a = McpServerConfig::new(SERVER_NAME, mock_server_a.mcp_url());
+        let server_b = McpServerConfig::new(SERVER_NAME, mock_server_b.mcp_url());
+
+        let tool_name_a =
+            activate_for_user(&manager, &secrets, &server_a, USER_A, "token-user-a").await;
+        assert!(
+            tool_registry.get(&tool_name_a).await.is_some(),
+            "user-a's wrapper must be registered after successful activation",
+        );
+
+        // User B attempts to install + activate the SAME server name
+        // pointing at a backend with a different tool surface.
+        manager
+            .install(
+                SERVER_NAME,
+                Some(&server_b.url),
+                Some(ExtensionKind::McpServer),
+                USER_B,
+            )
+            .await
+            .expect("install (distinct url) for user-b should succeed — install is per-user");
+
+        secrets
+            .create(
+                USER_B,
+                CreateSecretParams::new(server_b.token_secret_name(), "token-user-b")
+                    .with_provider(SERVER_NAME.to_string()),
+            )
+            .await
+            .expect("store user-b token");
+
+        let activation = manager.activate(SERVER_NAME, USER_B).await;
+        let err = activation
+            .expect_err("user-b activation with a divergent tool surface must be rejected");
+        let message = format!("{err:?}");
+        assert!(
+            message.contains("different tool surface") || message.contains("tool surface"),
+            "rejection message should explain the surface conflict, got: {message}"
+        );
+
+        // User A's wrappers must still be live and dispatchable — the
+        // rejection must not have unregistered or shadowed them.
+        assert!(
+            tool_registry.get(&tool_name_a).await.is_some(),
+            "rejecting user-b must leave user-a's wrapper intact in the registry",
+        );
+        assert!(
+            tool_registry.get("different_tool").await.is_none(),
+            "user-b's divergent tool name must NOT have leaked into the registry",
+        );
+
+        mock_server_a.shutdown().await;
+        mock_server_b.shutdown().await;
+    }
 }
--- a/tools-src/github/github-tool.capabilities.json
+++ b/tools-src/github/github-tool.capabilities.json
@@ -98,7 +98,7 @@
      ],
      "use_pkce": false
    },
-    "instructions": "Create a Personal Access Token at github.com/settings/tokens with repo scope, then paste it here.",
+    "instructions": "Create a Personal Access Token at github.com/settings/tokens with scopes: repo, workflow, read:org. Then paste it here.",
    "setup_url": "https://github.com/settings/tokens",
    "token_hint": "Starts with 'ghp_' or 'github_pat_'",
    "env_var": "GITHUB_TOKEN",