mirror of
https://github.com/rustfs/rustfs.git
synced 2026-07-02 06:14:43 +08:00
188 lines
11 KiB
Markdown
188 lines
11 KiB
Markdown
# Background Controller Contract
|
|
|
|
This document defines `BGC-002` for
|
|
[`rustfs/backlog#660`](https://github.com/rustfs/backlog/issues/660). It turns
|
|
the background service inventory into a shared vocabulary for future read-only
|
|
status work. It does not add a Rust trait, a scheduler, a service registry, or
|
|
any worker start/stop behavior.
|
|
|
|
## Scope
|
|
|
|
- PR type: `docs-only`.
|
|
- Baseline: `upstream/main` at
|
|
`f9a5e6d7e67322ac6f626b6f437a5e722fbe22e2`.
|
|
- Applies to future controller work for scanner, heal, lifecycle, replication,
|
|
dynamic config reload, capacity, metrics, memory observability, allocator
|
|
reclaim, and auto-tuning.
|
|
- Out of scope: worker creation, worker shutdown, queue resizing, storage
|
|
writes, readiness changes, peer signaling changes, scheduler replacement, and
|
|
crate splitting.
|
|
|
|
## Contract Vocabulary
|
|
|
|
| Term | Meaning | BGC-002 boundary |
|
|
|---|---|---|
|
|
| Desired | Static intent from env, persisted config, module switches, feature flags, bucket config, or admin configuration. | Read only. Do not normalize or mutate config while collecting desired state. |
|
|
| Current | Observed local runtime state such as configured, disabled, running, degraded, stopping, or unknown. | Read only. Do not infer state by starting probes that create storage or network side effects. |
|
|
| Status | Human-readable and machine-checkable snapshot of runtime counters, worker counts, queue pressure, last successful cycle, last error, cancellation source, and shutdown handle shape. | Side-effect-free. Missing status surfaces must be reported as `unknown`, not guessed. |
|
|
| Reconcile | Future comparison between desired, current, and status that can produce a recommendation. | No action in `BGC-002`; future reconcile must not start or stop workers until a tested pilot PR allows it. |
|
|
| Side effects | Writes, deletes, queue admission, target activation, external I/O, metrics emission, readiness publication, peer signal, or config reload fanout. | Must be declared before any controller migration touches that service. |
|
|
|
|
## State Model
|
|
|
|
Future status snapshots should use the narrowest state that the current code can
|
|
prove:
|
|
|
|
| State | Meaning | Notes |
|
|
|---|---|---|
|
|
| NotConfigured | No valid desired source exists for this service. | Use when config/module switches/features make the service absent. |
|
|
| Disabled | Desired source exists and explicitly disables the service. | Do not use for missing config. |
|
|
| Starting | Startup was requested and has not reached steady state. | Only expose when current code has a start boundary. |
|
|
| Running | The service is active according to existing runtime state. | Do not use merely because config is enabled. |
|
|
| Degraded | The service is active but current status exposes known error, partial, or stalled state. | Do not introduce new failure classification in docs-only work. |
|
|
| Stopping | Shutdown was requested and the service has not fully exited. | Only expose where shutdown can be observed. |
|
|
| Stopped | The service was started before and is now fully stopped. | Do not confuse with `Disabled` or `NotConfigured`. |
|
|
| Unknown | Current code lacks a safe status surface. | Preferred over speculative status. |
|
|
|
|
## Lifecycle Boundary
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
D["Desired source"]
|
|
C["Current runtime state"]
|
|
S["Read-only status snapshot"]
|
|
R["Future reconcile recommendation"]
|
|
W["Workers and side effects"]
|
|
|
|
D --> S
|
|
C --> S
|
|
S --> R
|
|
R -. "future tested pilot only" .-> W
|
|
```
|
|
|
|
`BGC-002` stops at the read-only contract. The arrow from reconcile to workers is
|
|
intentionally dotted because this PR does not allow any implementation to start,
|
|
stop, resize, or reconfigure workers.
|
|
|
|
## Service Boundaries
|
|
|
|
| Service area | Desired source | Current/status inputs | Side effects to preserve |
|
|
|---|---|---|---|
|
|
| Data scanner | Scanner env and runtime scanner config. | Admin scanner status, scanner metrics, scanner cancellation token, checkpoint/yield/alert counters. | Data usage cache updates, lifecycle evaluation, replication heal admission, scanner heal admission, alerts, and scanner metrics. |
|
|
| Heal/AHM | Heal enablement and scanner-driven heal admission. | Heal manager global channel, active task atomics, queue length atomics, AHM cancellation token. | Heal queue consumption, heal storage writes, and channel close semantics. |
|
|
| Lifecycle expiry/transition | Bucket lifecycle config and scanner event source. | Lifecycle worker counts, active tasks, queue send timeouts, transition stats, expiry/transition queues. | Object deletes, transition queueing, stale multipart cleanup, and lifecycle metrics. |
|
|
| Replication pool | Bucket/site replication config and resync admin requests. | Global replication stats, worker pool sizes, queue counters, persisted resync state, per-bucket cancel tokens. | Object replication, delete replication, queue resizing by channel close, persisted resync metadata, and admin-triggered cancel paths. |
|
|
| Dynamic config reload | Persisted server config, admin config calls, and peer snapshot signals. | Last local reload result, per-subsystem reload errors, peer reload signal result. | Scanner/heal runtime config updates, audit reload, notification reload, peer signaling, and config snapshot fanout. |
|
|
| Capacity manager | Local disk inventory and capacity feature state. | Capacity manager cache age, scheduled refresh state, last refresh result, runtime summary loop. | Global capacity cache refresh and runtime summary metrics/logging. |
|
|
| Metrics runtime | Observability metrics feature state and collector configuration. | Collector intervals, last collection result, cancellation token state, collector grouping. | Metrics collection and emission only. |
|
|
| Memory observability | Observability feature state and memory sampling config. | Sampler loop state, last sample time, last sample error, runtime cancellation token. | Memory metric emission. This is the preferred first BGC-003 status candidate. |
|
|
| Allocator reclaim | Allocator reclaim env/config and backend support. | Enabled flag, idle streak, active request gauge, scanner/heal activity gauges, last reclaim result. | Backend-specific allocator reclaim and metrics. |
|
|
| Auto-tuner | `RUSTFS_AUTOTUNER_ENABLED` and tuning inputs. | Last tuning attempt, last tuning error, 60-second loop state. | Runtime concurrency tuning. Treat as behavior-sensitive. |
|
|
|
|
The following areas stay outside the first controller migrations:
|
|
|
|
- deferred IAM recovery, because it can publish readiness;
|
|
- optional protocol servers, because they already have protocol shutdown handles;
|
|
- ECStore endpoint monitor and disk health monitor, because they are storage-
|
|
adjacent and can affect disk state;
|
|
- notification and audit runtime coupling, because live streams, replay, target
|
|
activation, and reload behavior need dedicated preservation tests.
|
|
|
|
## Read-Only Snapshot Requirements
|
|
|
|
Any future `BGC-003` status implementation must satisfy all of these:
|
|
|
|
- status collection must not start, stop, resize, or wake a worker;
|
|
- status collection must not write storage data, object metadata, target state,
|
|
queue entries, persisted config, or resync metadata;
|
|
- status collection must not publish readiness or peer reload signals;
|
|
- missing fields must be represented as `unknown` or omitted with a documented
|
|
reason;
|
|
- cancellation source and shutdown handle shape must be reported separately from
|
|
desired enabled/disabled state;
|
|
- scanner, heal, lifecycle, and replication status must not hide their queue and
|
|
admission coupling.
|
|
|
|
## BGC-003 Snapshot Pilot
|
|
|
|
The first read-only snapshot is memory observability status. It reports the
|
|
service name, whether observability metrics currently enable the sampler, the
|
|
configured sampler interval, runtime-token cancellation state, and the absence
|
|
of a dedicated shutdown handle.
|
|
|
|
This snapshot intentionally does not define an admin route, scheduler, service
|
|
registry, worker start/stop path, readiness signal, peer signal, storage write,
|
|
or metrics emission change.
|
|
|
|
## BGC-004 Controller Pilot
|
|
|
|
The first controller pilot is also memory observability. It converts the
|
|
existing desired inputs and status snapshot into a typed reconcile plan. The
|
|
pilot reports desired state, current state, and worker mutation intent.
|
|
|
|
The only allowed worker mutation for this pilot is `none`. Repeated reconcile
|
|
calls must return the same plan for the same snapshot and must not request a
|
|
worker start, stop, resize, wakeup, storage write, readiness signal, peer
|
|
signal, or metrics emission.
|
|
|
|
## BGC-005 Allocator Reclaim Status And Controller Surface
|
|
|
|
The second low-risk controller/status surface is allocator reclaim. It reports
|
|
the service name, desired enablement, configured force flag, backend-specific
|
|
effective force, idle interval settings, runtime-token cancellation state, and
|
|
the absence of a dedicated shutdown handle.
|
|
|
|
The only allowed worker mutation for this surface is `none`. Reconcile output is
|
|
read-only and must not start, stop, resize, wake, or otherwise drive the
|
|
allocator reclaim loop. Existing backend-specific force handling, idle-streak
|
|
logic, metrics emission, and runtime-token shutdown behavior remain owned by the
|
|
current loop.
|
|
|
|
## BGC-006 Metrics Runtime Status And Controller Surface
|
|
|
|
The third low-risk controller/status surface is metrics runtime. It reports the
|
|
service name, observability metrics enablement, collector task count, configured
|
|
collector intervals, replication bandwidth zero-tombstone cycle count,
|
|
runtime-token cancellation state, and the absence of a dedicated shutdown
|
|
handle.
|
|
|
|
The only allowed worker mutation for this surface is `none`. Reconcile output is
|
|
read-only and must not start, stop, resize, wake, or otherwise drive metrics
|
|
collector tasks. Existing collector grouping, interval parsing, metrics
|
|
emission, replication bandwidth tombstone handling, and runtime-token shutdown
|
|
behavior remain owned by the current loops.
|
|
|
|
## Future Reconcile Rules
|
|
|
|
Future reconcile work is allowed only after a read-only status snapshot exists.
|
|
The first reconcile pilot must:
|
|
|
|
- choose one low-risk service;
|
|
- compare desired/current/status without side effects;
|
|
- prove idempotence under repeated calls;
|
|
- prove no duplicate workers are created;
|
|
- preserve existing shutdown order and cancellation source;
|
|
- include rollback guidance that removes the pilot without changing existing
|
|
worker behavior.
|
|
|
|
Memory observability is the recommended first candidate because it already has a
|
|
simple runtime cancellation loop and no storage writes. Scanner, heal,
|
|
replication, lifecycle, disk health, deferred IAM recovery, and auto-tuning must
|
|
wait for focused preservation tests.
|
|
|
|
## Verification Expectations
|
|
|
|
For this docs-only contract:
|
|
|
|
- architecture migration guard scripts must pass;
|
|
- layer dependency and metrics reference guards must pass;
|
|
- no Rust source, Cargo metadata, CI workflow, Makefile, or runtime config file
|
|
may change.
|
|
|
|
For the next implementation PRs:
|
|
|
|
- add focused tests before changing behavior;
|
|
- do not modify production logic only to make tests pass;
|
|
- keep compatibility comments searchable with `RUSTFS_COMPAT_TODO(<task-id>)`
|
|
whenever temporary old paths are retained for later deletion.
|