sandbox-runtime

mirror of https://github.com/anthropic-experimental/sandbox-runtime.git synced 2026-05-07 06:01:25 +08:00
Files
Dylan Conway e94c5fd01d Run full test suite in CI and migrate platform skips to describe.if (#197 )
* Run full test suite in CI and migrate platform skips to describe.if

CI was running test:unit + test:integration, a curated subset of 5 files.
Most test files were never run in CI. Switch to `npm test` which runs
everything. Drop the test:unit/test:integration scripts.

Migrate the inline `if (skipIfNotLinux()) return` pattern to bun's native
`describe.if()`/`it.if()`. The old pattern made wrong-platform tests show
as pass (zero assertions, green checkmark) instead of skip — CI's test
count looked the same regardless of what actually ran. New
test/helpers/platform.ts exports isLinux/isMacOS/isSupportedPlatform.

Delete ~310 lines of unreachable tests from seccomp-filter.test.ts:
- skipIfNotAnt() gate checked USER_TYPE env var that nothing sets
- Two tests called wrapCommandWithSandboxLinux() with no restrictions,
  which returns the command unwrapped at the early-return check —
  expect("echo test").not.toContain("apply-seccomp") was vacuously true

Pin allow-read root-deny tests to /bin/bash — EXEC_DEPS doesn't list
/opt/homebrew, so execvp failed on Macs with homebrew bash as SHELL.

Add docker-tests CI job: unprivileged container on both arches,
exercises enableWeakerNestedSandbox end-to-end.

Drop push trigger from '**' to 'main' — PRs were running the full
matrix twice (once for branch push, once for the PR event).

* Replace mock.module with spyOn in linux-dependency-error tests

mock.module patches bun's module cache globally and never unmocks.
With npm test running all files in one process (instead of the old
test:unit + test:integration split), the mock leaked: every file that
imported getApplySeccompBinaryPath after this one got () => null, so
pid-namespace-isolation.test.ts and integration.test.ts failed in
beforeAll.

spyOn swaps one export binding; mockRestore in afterEach puts it back.
The callee's own import binding routes through the same slot in bun, so
checkLinuxDependencies sees the spy without any module-level surgery.

Also spies on whichSync directly rather than overwriting Bun.which on
globalThis — same fix, closer to what's actually being tested.

Drop stale README reference to the deleted test:integration script.

* Replace docker test-suite job with srt end-to-end test

The full suite assumes bwrap --proc /proc works; an unprivileged
container doesn't have CAP_SYS_ADMIN for that. Only tests that set
enableWeakerNestedSandbox can pass there.

Instead of filtering which unit tests to run, test the thing the job
is for: build srt, run it with enableWeakerNestedSandbox, check that
allowed writes land, denied writes don't, and the seccomp filter blocks
AF_UNIX. Gated on SRT_E2E_DOCKER so host jobs skip it.

* Rename docker job to match other Tests jobs

* Add required network key to docker test config

SandboxRuntimeConfigSchema requires network (no .optional()). Without it
loadConfig returns null, srt falls through to getDefaultConfig, and the
sandbox enforces a different allowWrite than the test expects.

* Add explicit timeouts to update-config sandboxed-curl tests

The three it.if(isLinux) tests each run two spawnSync calls with curl
--max-time 3 then --max-time 5. When example.com responds slowly both
curls run to their limits and the body takes ~8s, but bun's default
test timeout is 5000ms. bun aborts mid-body; afterEach runs reset()
against an in-flight spawn and the next test sees stale state.

These were never in test:integration so they never ran on CI before
this branch. On fast responses they complete in under 200ms.
2026-03-31 15:36:39 -07:00
platform.ts
Run full test suite in CI and migrate platform skips to describe.if (#197 )
2026-03-31 15:36:39 -07:00