Files
rustfs/CLAUDE.md
Marcelo Bartsch f00898d070 fix(tier): recover (#3182)
* fix(tier): stop sending nil/garbage versionId to warm backend S3

Three bugs caused NoSuchVersion errors when reading tiered objects:

1. warm_backend_s3sdk: GET and DELETE ignored rv/range opts entirely —
   fixed to forward version_id and byte-range to the SDK request.

2. version.rs (MetaObject + MetaDeleteMarker): transition_version_id was
   parsed with unwrap_or_default(), turning invalid/wrong-length bytes
   into Uuid::nil(). The nil UUID was then serialized and sent as
   ?versionId=00000000-... to the tier backend -> NoSuchVersion.
   Fixed: .and_then(.ok()).filter(!is_nil()) so only valid non-nil UUIDs
   are forwarded as versionId.

3. bucket_lifecycle_ops: add debug/error logs in
   get_transitioned_object_reader to record tier, tier_object, and
   tier_version_id before and on failure of the tier GET.

Also adds tier transition fields to dump_fileinfo example for offline
xl.meta inspection, and fixes Docker build (cargo path + entrypoint).
Adds CLAUDE.md with tier architecture and debugging notes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* more fixes for versionId

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Marcelo Bartsch <marcelo@bartsch.cl>

* remove branch

* Add tests and fix cargo path, add load to build-docker

* update documentation (CLAUDE.md)

* more fixes for recover

* More fixes to ILM recover

* final fix

* chore: add missing-shard first-scene diagnostics (#3213)

chore(ecstore): add missing-shard first-scene diagnostics

Log rename_data quorum context behind RUSTFS_ISSUE3031_DIAG_ENABLE so partial-disk success can be correlated with later missing shard reads.

Also log put_object commit success and tmp cleanup boundaries to capture when successful quorum writes are followed by tmp_dir cleanup.

* fix test anmd fmt

* fix cargo path
fix test

* fix(tier): format copy_object self-copy guard

---------

Signed-off-by: Marcelo Bartsch <marcelo@bartsch.cl>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: 安正超 <anzhengchao@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: houseme <housemecn@gmail.com>
Co-authored-by: cxymds <Cxymds@qq.com>
Co-authored-by: loverustfs <hello@rustfs.com>
2026-06-07 09:38:28 +00:00

10 KiB

RustFS — CLAUDE.md

S3-compatible object store in Rust, derived from MinIO. Erasure-coded, multi-pool, supports ILM tiering/lifecycle.

Commands

cargo build --release --bin rustfs        # production binary
cargo build                               # dev build
cargo check -p <crate>                    # fast type-check one crate
cargo test -p <crate>                     # test one crate
cargo fmt --all                           # format (required before PR)
make pre-commit                           # full pre-PR gate (fmt + clippy + test)
make build-docker BUILD_OS=ubuntu22.04    # Docker cross-build

Docker build note: buildx build without --load keeps the image in the buildx cache only — docker run will use a stale local image. The Makefile already includes --load; if you suspect a stale binary, add --no-cache to the buildx build invocation inside .config/make/build-docker.mak.

Agent/PR rules: see .github/copilot-instructions.md. Crate membership: Cargo.toml [workspace].members. CI gates: .github/workflows/ci.yml.

Workspace layout

rustfs/src/main.rs                          # binary entry point
crates/ecstore/src/
  set_disk.rs                               # ErasureSet: transition_object, restore_transitioned_object
  store.rs / store_api/                     # ECStore trait + ObjectInfo / TransitionedObject types
  bucket/lifecycle/
    bucket_lifecycle_ops.rs                 # ILM actions: transition_object, expire_transitioned_object,
                                            #   get_transitioned_object_reader, gen_transition_objname
    tier_sweeper.rs                         # background sweep: delete_object_from_remote_tier
  tier/
    warm_backend.rs                         # WarmBackend trait (put/get/remove/in_use)
    warm_backend_s3.rs                      # HTTP-client based (TransitionClient) — used for S3/MinIO
    warm_backend_s3sdk.rs                   # aws-sdk-s3 based — alternative S3 backend
    warm_backend_minio.rs / _rustfs.rs / …  # per-provider wrappers (all delegate to _s3 or _s3sdk)
    tier.rs                                 # TierConfigMgr, new_warm_backend dispatch
  client/transition_api.rs                  # TransitionClient HTTP plumbing; UploadInfo, to_object_info
  client/api_put_object_streaming.rs        # put_object_do → UploadInfo (version_id from x-amz-version-id)
crates/filemeta/src/
  filemeta.rs                               # FileMeta (xl.meta top-level), is_skip_meta_key
  filemeta/version.rs                       # FileMetaVersion, MetaObject, MetaDeleteMarker
                                            #   → to_fileinfo() reads transition_version_id
                                            #   → set_transition() writes raw UUID bytes
                                            #   → From<FileInfo> for MetaObject writes all meta
  fileinfo.rs                               # FileInfo struct (transition_version_id: Option<Uuid>)
  examples/
    dump_fileinfo.rs                        # CLI: parse xl.meta, print transition fields + metadata
    dump_versions.rs                        # CLI: list all versions in xl.meta
crates/utils/src/http/metadata_compat.rs    # SUFFIX_* constants, insert_bytes/get_bytes (dual RustFS+MinIO keys)

Metadata key conventions

Internal metadata is stored under both x-rustfs-internal-<suffix> and x-minio-internal-<suffix> for MinIO interoperability. get_bytes prefers the RustFS key with MinIO fallback.

Key suffixes (from metadata_compat.rs):

Suffix Meaning
transition-status "complete" when tiered
transitioned-object tier key path (without prefix)
transitioned-versionID S3 version_id returned by tier PUT (16 raw UUID bytes, or absent)
transition-tier tier name
tier-free-versionID delete-marker version for free-version sweep

Tier / ILM transition architecture

Transition flow (hot → cold)

  1. transition_object (lifecycle_ops) → ECStore::transition_objectset_disk.rs
  2. gen_transition_objname(bucket){sha256_hash[0..16]}/{uuid[0..2]}/{uuid[2..4]}/{uuid} (unique per object version)
  3. tgt_client.put_with_meta(dest_obj, …) → returns rv: String (remote S3 version_id, or "")
  4. fi.transition_version_id = if rv.is_empty() { None } else { Some(Uuid::parse_str(&rv)?) }
  5. fi.transitioned_objname = dest_obj (without tier prefix)
  6. Written to xl.meta via MetaObject::from(FileInfo)insert_bytes(SUFFIX_TRANSITIONED_VERSION_ID, uuid.as_bytes()) (16 raw bytes)

Tier GET flow (restore/read)

get_transitioned_object_reader (lifecycle_ops):

  • reads oi.transitioned_object.name (= fi.transitioned_objname)
  • reads oi.transitioned_object.version_id (= fi.transition_version_id.to_string() or "")
  • calls warm_backend.get(name, version_id, opts)
  • warm_backend_s3.rs::get: adds ?versionId=… only when rv != ""

Tier prefix handling

WarmBackendS3::get_dest(object) prepends self.prefix to the object name. transitioned_objname is stored without the prefix — get_dest adds it on every call.

xl.meta on disk

Path: {disk}/{bucket}/{object}/xl.meta — one per erasure shard disk. All shards should be identical for a healthy object.

Known bugs & fixes

Bug 1: NoSuchVersion on tier GET — nil UUID sent as versionId

Root cause: transitioned-versionID metadata key exists with empty string value (0 bytes). Old reading code:

// OLD — unwrap_or_default() converts 0-byte or wrong-length slice to Uuid::nil()
get_bytes().map(|v| Uuid::from_slice(v.as_slice()).unwrap_or_default())
// → Some(Uuid::nil()) → sends ?versionId=00000000-… → NoSuchVersion

Fix (version.rs, MetaObject::to_fileinfo + MetaDeleteMarker::to_fileinfo):

get_bytes()
    .and_then(|v| Uuid::from_slice(v.as_slice()).ok())  // None for wrong-length bytes
    .filter(|u| !u.is_nil())                            // None for nil UUID (old write-back)

Regression tests (crates/filemeta/src/filemeta/version.rs mod tests): 6 tests cover absent key, empty bytes, nil UUID, and valid UUID round-trip for both MetaObject and MetaDeleteMarker paths.

Bug 2: warm_backend_s3sdk.rs ignored rv and range opts

Fix: added req.version_id(rv) and req.range(…) to GET; req.version_id(rv) to DELETE.

Bug 4: set_disk::copy_object returns 501 for tiered objects (storage class restore)

Root cause: set_disk::copy_object immediately returns StorageError::NotImplemented when src_info.metadata_only = false. For tiered objects, metadata_only is never set to true (guarded by transitioned_object.tier.is_empty()). So mc cp --storage-class STANDARD obj obj on a tiered object always returns 501. Fix (crates/ecstore/src/set_disk.rs, copy_object):

if !src_info.metadata_only {
    if path_join_buf(&[src_bucket, src_object]) == path_join_buf(&[dst_bucket, dst_object]) {
        if let Some(mut put_reader) = src_info.put_object_reader.take() {
            return self.put_object(dst_bucket, dst_object, &mut put_reader, dst_opts).await;
        }
    }
    return Err(StorageError::NotImplemented);
}

When a self-copy has a put_object_reader (data already fetched from tier in execute_copy_object), writes it back locally via put_object, effectively de-tiering the object. How mc cp --storage-class STANDARD flows:

  1. mc sends PUT /bucket/key with x-amz-copy-source, x-amz-metadata-directive: REPLACE, x-amz-storage-class: STANDARD
  2. execute_copy_objectget_object_reader fetches data from tier backend → stores in src_info.put_object_reader
  3. store.copy_object(...) → now calls put_object with tier data and STANDARD storage class in dst_opts
  4. New xl.meta written locally with STANDARD class, no tier metadata → object de-tiered

Bug 3 (open): race in expire_transitioned_object

Order is: delete remote tier version → delete local object. A concurrent GET between those two steps fetches a valid stored version_id but the tier version is already gone → NoSuchVersion. The proper fix is to delete local metadata first (making the object unreachable) before deleting the remote tier version.

Debugging tier issues

Inspect xl.meta directly

cargo build -p rustfs-filemeta --example dump_fileinfo
./target/debug/examples/dump_fileinfo /srv/rustfs/data/disk0/{bucket}/{object}/xl.meta
# Shows: transition_status, transition_tier, transitioned_obj, transition_ver_id

transition_ver_id: <none> → no versionId will be sent to tier (correct for non-versioned tier bucket). transition_ver_id: <uuid> → that UUID will be sent as ?versionId=<uuid>.

Check what versionId is being sent at runtime

Enable debug logging:

RUST_LOG=rustfs_ecstore::bucket::lifecycle=debug rustfs …

Log line: fetching transitioned object from tier (DEBUG before request). Log line: tier GET failed (ERROR on failure, includes tier_version_id).

Metadata key to watch

x-minio-internal-transitioned-versionID=   ← empty string = will cause NoSuchVersion with old code
x-rustfs-internal-transitioned-versionID=  ← same

If both are empty string, the object was transitioned to a non-versioned tier bucket. The versionId should NOT be sent — fixed by Bug 1 above.

Common patterns

Writing internal metadata (binary values)

insert_bytes(&mut meta_sys, SUFFIX_TRANSITIONED_VERSION_ID, uuid.as_bytes().to_vec());
// stores under both x-rustfs-internal-* and x-minio-internal-* keys

Reading internal metadata (binary values)

get_bytes(&self.meta_sys, SUFFIX_TRANSITIONED_VERSION_ID)
    .and_then(|v| Uuid::from_slice(v.as_slice()).ok())
    .filter(|u| !u.is_nil())
// Returns None for: absent, wrong-length bytes, nil UUID

WarmBackend trait

put_with_meta(object, reader, length, meta) -> Result<String>  // returns S3 version_id or ""
put(object, reader, length) -> Result<String>
get(object, rv, opts) -> Result<ReadCloser>  // rv="" means no versionId
remove(object, rv) -> Result<()>
in_use() -> Result<bool>

rv = remote version, always pass as empty string when transition_version_id is None.