Files
rustfs/docs/architecture/background-controller-contract.md

11 KiB

Background Controller Contract

This document defines BGC-002 for rustfs/backlog#660. It turns the background service inventory into a shared vocabulary for future read-only status work. It does not add a Rust trait, a scheduler, a service registry, or any worker start/stop behavior.

Scope

  • PR type: docs-only.
  • Baseline: upstream/main at f9a5e6d7e67322ac6f626b6f437a5e722fbe22e2.
  • Applies to future controller work for scanner, heal, lifecycle, replication, dynamic config reload, capacity, metrics, memory observability, allocator reclaim, and auto-tuning.
  • Out of scope: worker creation, worker shutdown, queue resizing, storage writes, readiness changes, peer signaling changes, scheduler replacement, and crate splitting.

Contract Vocabulary

Term Meaning BGC-002 boundary
Desired Static intent from env, persisted config, module switches, feature flags, bucket config, or admin configuration. Read only. Do not normalize or mutate config while collecting desired state.
Current Observed local runtime state such as configured, disabled, running, degraded, stopping, or unknown. Read only. Do not infer state by starting probes that create storage or network side effects.
Status Human-readable and machine-checkable snapshot of runtime counters, worker counts, queue pressure, last successful cycle, last error, cancellation source, and shutdown handle shape. Side-effect-free. Missing status surfaces must be reported as unknown, not guessed.
Reconcile Future comparison between desired, current, and status that can produce a recommendation. No action in BGC-002; future reconcile must not start or stop workers until a tested pilot PR allows it.
Side effects Writes, deletes, queue admission, target activation, external I/O, metrics emission, readiness publication, peer signal, or config reload fanout. Must be declared before any controller migration touches that service.

State Model

Future status snapshots should use the narrowest state that the current code can prove:

State Meaning Notes
NotConfigured No valid desired source exists for this service. Use when config/module switches/features make the service absent.
Disabled Desired source exists and explicitly disables the service. Do not use for missing config.
Starting Startup was requested and has not reached steady state. Only expose when current code has a start boundary.
Running The service is active according to existing runtime state. Do not use merely because config is enabled.
Degraded The service is active but current status exposes known error, partial, or stalled state. Do not introduce new failure classification in docs-only work.
Stopping Shutdown was requested and the service has not fully exited. Only expose where shutdown can be observed.
Stopped The service was started before and is now fully stopped. Do not confuse with Disabled or NotConfigured.
Unknown Current code lacks a safe status surface. Preferred over speculative status.

Lifecycle Boundary

flowchart LR
    D["Desired source"]
    C["Current runtime state"]
    S["Read-only status snapshot"]
    R["Future reconcile recommendation"]
    W["Workers and side effects"]

    D --> S
    C --> S
    S --> R
    R -. "future tested pilot only" .-> W

BGC-002 stops at the read-only contract. The arrow from reconcile to workers is intentionally dotted because this PR does not allow any implementation to start, stop, resize, or reconfigure workers.

Service Boundaries

Service area Desired source Current/status inputs Side effects to preserve
Data scanner Scanner env and runtime scanner config. Admin scanner status, scanner metrics, scanner cancellation token, checkpoint/yield/alert counters. Data usage cache updates, lifecycle evaluation, replication heal admission, scanner heal admission, alerts, and scanner metrics.
Heal/AHM Heal enablement and scanner-driven heal admission. Heal manager global channel, active task atomics, queue length atomics, AHM cancellation token. Heal queue consumption, heal storage writes, and channel close semantics.
Lifecycle expiry/transition Bucket lifecycle config and scanner event source. Lifecycle worker counts, active tasks, queue send timeouts, transition stats, expiry/transition queues. Object deletes, transition queueing, stale multipart cleanup, and lifecycle metrics.
Replication pool Bucket/site replication config and resync admin requests. Global replication stats, worker pool sizes, queue counters, persisted resync state, per-bucket cancel tokens. Object replication, delete replication, queue resizing by channel close, persisted resync metadata, and admin-triggered cancel paths.
Dynamic config reload Persisted server config, admin config calls, and peer snapshot signals. Last local reload result, per-subsystem reload errors, peer reload signal result. Scanner/heal runtime config updates, audit reload, notification reload, peer signaling, and config snapshot fanout.
Capacity manager Local disk inventory and capacity feature state. Capacity manager cache age, scheduled refresh state, last refresh result, runtime summary loop. Global capacity cache refresh and runtime summary metrics/logging.
Metrics runtime Observability metrics feature state and collector configuration. Collector intervals, last collection result, cancellation token state, collector grouping. Metrics collection and emission only.
Memory observability Observability feature state and memory sampling config. Sampler loop state, last sample time, last sample error, runtime cancellation token. Memory metric emission. This is the preferred first BGC-003 status candidate.
Allocator reclaim Allocator reclaim env/config and backend support. Enabled flag, idle streak, active request gauge, scanner/heal activity gauges, last reclaim result. Backend-specific allocator reclaim and metrics.
Auto-tuner RUSTFS_AUTOTUNER_ENABLED and tuning inputs. Last tuning attempt, last tuning error, 60-second loop state. Runtime concurrency tuning. Treat as behavior-sensitive.

The following areas stay outside the first controller migrations:

  • deferred IAM recovery, because it can publish readiness;
  • optional protocol servers, because they already have protocol shutdown handles;
  • ECStore endpoint monitor and disk health monitor, because they are storage- adjacent and can affect disk state;
  • notification and audit runtime coupling, because live streams, replay, target activation, and reload behavior need dedicated preservation tests.

Read-Only Snapshot Requirements

Any future BGC-003 status implementation must satisfy all of these:

  • status collection must not start, stop, resize, or wake a worker;
  • status collection must not write storage data, object metadata, target state, queue entries, persisted config, or resync metadata;
  • status collection must not publish readiness or peer reload signals;
  • missing fields must be represented as unknown or omitted with a documented reason;
  • cancellation source and shutdown handle shape must be reported separately from desired enabled/disabled state;
  • scanner, heal, lifecycle, and replication status must not hide their queue and admission coupling.

BGC-003 Snapshot Pilot

The first read-only snapshot is memory observability status. It reports the service name, whether observability metrics currently enable the sampler, the configured sampler interval, runtime-token cancellation state, and the absence of a dedicated shutdown handle.

This snapshot intentionally does not define an admin route, scheduler, service registry, worker start/stop path, readiness signal, peer signal, storage write, or metrics emission change.

BGC-004 Controller Pilot

The first controller pilot is also memory observability. It converts the existing desired inputs and status snapshot into a typed reconcile plan. The pilot reports desired state, current state, and worker mutation intent.

The only allowed worker mutation for this pilot is none. Repeated reconcile calls must return the same plan for the same snapshot and must not request a worker start, stop, resize, wakeup, storage write, readiness signal, peer signal, or metrics emission.

BGC-005 Allocator Reclaim Status And Controller Surface

The second low-risk controller/status surface is allocator reclaim. It reports the service name, desired enablement, configured force flag, backend-specific effective force, idle interval settings, runtime-token cancellation state, and the absence of a dedicated shutdown handle.

The only allowed worker mutation for this surface is none. Reconcile output is read-only and must not start, stop, resize, wake, or otherwise drive the allocator reclaim loop. Existing backend-specific force handling, idle-streak logic, metrics emission, and runtime-token shutdown behavior remain owned by the current loop.

BGC-006 Metrics Runtime Status And Controller Surface

The third low-risk controller/status surface is metrics runtime. It reports the service name, observability metrics enablement, collector task count, configured collector intervals, replication bandwidth zero-tombstone cycle count, runtime-token cancellation state, and the absence of a dedicated shutdown handle.

The only allowed worker mutation for this surface is none. Reconcile output is read-only and must not start, stop, resize, wake, or otherwise drive metrics collector tasks. Existing collector grouping, interval parsing, metrics emission, replication bandwidth tombstone handling, and runtime-token shutdown behavior remain owned by the current loops.

Future Reconcile Rules

Future reconcile work is allowed only after a read-only status snapshot exists. The first reconcile pilot must:

  • choose one low-risk service;
  • compare desired/current/status without side effects;
  • prove idempotence under repeated calls;
  • prove no duplicate workers are created;
  • preserve existing shutdown order and cancellation source;
  • include rollback guidance that removes the pilot without changing existing worker behavior.

Memory observability is the recommended first candidate because it already has a simple runtime cancellation loop and no storage writes. Scanner, heal, replication, lifecycle, disk health, deferred IAM recovery, and auto-tuning must wait for focused preservation tests.

Verification Expectations

For this docs-only contract:

  • architecture migration guard scripts must pass;
  • layer dependency and metrics reference guards must pass;
  • no Rust source, Cargo metadata, CI workflow, Makefile, or runtime config file may change.

For the next implementation PRs:

  • add focused tests before changing behavior;
  • do not modify production logic only to make tests pass;
  • keep compatibility comments searchable with RUSTFS_COMPAT_TODO(<task-id>) whenever temporary old paths are retained for later deletion.