mirror of
https://github.com/rustfs/rustfs.git
synced 2026-06-20 19:56:01 +08:00
* perf(put): add eager path metrics and isolation tooling * fix(decommission): persist progress adaptively (#3497) Persist decommission progress after either the existing time interval or a migrated-item threshold, and flush progress baselines after bucket and terminal-state saves. Also stabilize the OIDC discovery mock used by the pre-commit gate. * refactor: move bucket operations contract (#3507) * fix(s3): handle multipart flexible checksums (#3508) * fix(io-core): avoid blocking on pooled buffer return * perf(put): add slow inflight diagnostics * perf(put): fix 16KiB regression with threshold and pool bypass - Lower SMALL_EAGER_PUT_MAX_SIZE from 256KB to 8KB so objects >8KiB use the streaming BufReader path (matches baseline behavior) - Add POOL_BYPASE_MAX_SIZE (16KiB) to bypass BytesPool for very small objects, avoiding Small-tier Mutex contention under high concurrency - Add read_small_put_body_exact_direct() for direct Vec<u8> allocation - Fix stale test assertions to match new 8KB threshold Root cause analysis: the 16KiB regression was primarily caused by instrumentation overhead in set_disk.rs (4x Instant::now() + metrics per PUT), not BytesPool contention. Lowering the threshold eliminates the eager-path overhead for 16KiB+ objects. * perf(put): gate stage metrics behind observability flag Add put_stage_metrics_enabled() AtomicBool switch in io-metrics crate. When disabled (default), record_put_object_path() and record_put_object_stage_duration() are no-ops, avoiding unnecessary histogram/counter macro overhead in the PUT hot path. The flag is set to true during startup when OTEL metric export is enabled (rustfs_obs::observability_metric_enabled() == true). This eliminates the per-request metrics overhead that contributed to the 16KiB PUT regression when metrics collection is not active. * perf(put): comprehensive optimization - restore eager path, cache env, remove UUID Change 1: Restore SMALL_EAGER_PUT_MAX_SIZE from 8KB to 1MB - The try_lock() fix (d13a189e3) eliminates the blocking that caused service health timeouts under 512KiB c64 load - Eager path with BytesPool is now safe for objects up to 1MB - Recovers the eager path benefit for 32KiB-256KiB objects Change 2: Adjust POOL_BYPASE_MAX_SIZE from 16KB to 4KB - With eager path restored to 1MB, objects 4KB-1MB benefit from pool reuse - Only ≤4KB objects bypass the pool (allocation cost negligible) Change 3: Cache RUSTFS_ERASURE_ENCODE_MAX_INFLIGHT_BYTES via OnceLock - Eliminates per-encode std::env::var() syscall - Env var still works (read once at first use) Change 4: Replace Uuid::new_v4() with Uuid::nil() in Erasure construction - _id field is unused in hot paths (documented in code) - Eliminates CSPRNG syscall per PUT request Change 5: Add concurrency-aware buffer sizing to PUT path - Reuses get_concurrency_aware_buffer_size() from GET path - Reduces buffer size under high concurrency (0.4x at >8 concurrent) - Lowers memory pressure for >1MB streaming PUTs * chore: add pyroscope feature flag and clean up imports - Add pyroscope feature flag forwarding to rustfs-obs - Remove unused allow(non_upper_case_globals) in globals.rs - Sort imports and fix Cargo.toml formatting consistency * style: fix import ordering and code formatting - Sort imports alphabetically in globals.rs, encode.rs - Fix indentation in erasure_coding encode/erasure - Clean up HashReader formatting in object_usecase.rs * fix(test): use tokio::test for request_logging_layer tests The tests call tokio::spawn via RequestContextLayer, which requires a Tokio runtime. Changed from #[test] + futures::executor::block_on to #[tokio::test] + .await, and replaced tracing::subscriber::with_default with tracing::subscriber::set_default to support async. * fix(bench): normalize no-space throughput/latency parsing in to_bps/to_ms When a benchmark tool prints throughput without a separator (e.g. 123MiB/s), awk '{print $2}' returns empty because the whole string is one field, causing to_bps to return N/A and losing valid measurements in CSV output. Insert a space between number and unit via sed before awk field splitting. Same fix applied to to_ms for latency values like '50ms'. Also add TODO comment on PUT path noting that get_concurrency_aware_buffer_size reads ACTIVE_GET_REQUESTS instead of PUT concurrency (PR #3514 review). Refs: PR #3514 review comments by chatgpt-codex-connector * fix(metrics): correct POOL_BYPASS comments and separate PUT vs generic stage metrics - Fix 3 comment-code mismatches: POOL_BYPASS_MAX_SIZE is 4KiB, not 16KiB - Add generic record_stage_duration() with separate histogram (rustfs_internal_stage_duration_ms) for non-PUT paths - Replace record_put_object_stage_duration with record_stage_duration in metacache_set, store_list_objects, and bucket_lifecycle_ops to avoid polluting PUT-specific dashboards with listing/lifecycle timings - Fix flaky test: serialize tests mutating PUT_STAGE_METRICS_ENABLED with METRICS_FLAG_LOCK mutex and explicitly set desired state at test start Refs: PR #3514 review comments by chatgpt-codex-connector * style: apply cargo fmt to metacache_set.rs --------- Co-authored-by: cxymds <cxymds@gmail.com> Co-authored-by: 安正超 <anzhengchao@gmail.com>
15 KiB
Executable File
15 KiB
Executable File