mirror of https://github.com/rustfs/rustfs.git synced 2026-06-21 07:35:59 +08:00

Files

Henry Guo 5d9fee5c0c feat(table-catalog): expose backing migration contract (#3574 )

* feat(table-catalog): expose backing migration contract

* fix(table-catalog): require register auth for external sync

---------

Co-authored-by: Henry Guo <marshawcoco@users.noreply.github.com>

2026-06-18 21:14:40 +08:00

pyiceberg_smoke.py

feat(table-catalog): expose backing migration contract (#3574 )

2026-06-18 21:14:40 +08:00

README.md

feat(table-catalog): expose backing migration contract (#3574 )

2026-06-18 21:14:40 +08:00

test_pyiceberg_smoke.py

feat(table-catalog): expose backing migration contract (#3574 )

2026-06-18 21:14:40 +08:00

README.md

RustFS S3 Tables Client Conformance

This directory contains repeatable client-facing checks for the RustFS S3 Tables Iceberg REST Catalog surface. The goal is to keep S3 Tables compatibility claims grounded in runnable scripts or explicit unsupported entries.

PyIceberg Smoke Test

Install the client dependencies:

python3 -m pip install 'pyiceberg[pyarrow]' boto3

Start RustFS locally, then run:

python3 scripts/table-catalog/pyiceberg_smoke.py \
  --endpoint http://127.0.0.1:9000 \
  --access-key rustfsadmin \
  --secret-key rustfsadmin \
  --bucket rustfs-s3table-smoke \
  --replace \
  --cleanup

The smoke test covers:

create or reuse the S3 bucket
enable the RustFS table bucket
load the PyIceberg REST catalog
create namespace and table
append two rows through PyIceberg
reload and scan the table
probe direct REST catalog endpoints for metadata-location, table refs, Iceberg views, maintenance config, metadata maintenance, worker run, and catalog diagnostics
optionally drop the table and namespace

The default profile uses the canonical RustFS catalog URI:

http://127.0.0.1:9000/iceberg

To exercise the MinIO AIStor-style alias exposed by RustFS:

python3 scripts/table-catalog/pyiceberg_smoke.py \
  --profile rustfs-compat \
  --endpoint http://127.0.0.1:9000 \
  --access-key rustfsadmin \
  --secret-key rustfsadmin \
  --bucket rustfs-s3table-smoke \
  --replace \
  --cleanup

rustfs-compat uses:

catalog URI: http://127.0.0.1:9000/_iceberg
REST signing name: s3tables

If the local deployment still requires the standard S3 signing name for the alias path, override it explicitly:

python3 scripts/table-catalog/pyiceberg_smoke.py \
  --profile rustfs-compat \
  --rest-signing-name s3

To verify catalog-vended table credentials, enable server-side credential vending and use the vended credential profile:

RUSTFS_TABLE_CATALOG_CREDENTIAL_VENDING=enabled

python3 scripts/table-catalog/pyiceberg_smoke.py \
  --profile rustfs-vended-credentials \
  --endpoint http://127.0.0.1:9000 \
  --access-key rustfsadmin \
  --secret-key rustfsadmin \
  --bucket rustfs-s3table-smoke \
  --replace \
  --cleanup

This profile uses the configured principal to create the bucket, enable the table bucket, and create the table. After the table exists, it calls the REST credentials endpoint and reloads the PyIceberg catalog with the returned table-scoped S3 access key, secret key, and session token before append, reload, and scan operations.

Before the PyIceberg append, the profile also checks that the returned credential prefix exactly matches the created table warehouse location after canonical S3 URI normalization, including percent-decoding equivalent path encodings. It then runs a direct S3 data-plane scope probe with the returned temporary credentials:

PutObject, HeadObject, GetObject, and DeleteObject must work inside the returned table warehouse prefix.
PutObject and GetObject to the same bucket outside that prefix must be rejected.

The direct REST catalog probes run by default after the PyIceberg append and scan. For deployments that intentionally expose only the core Iceberg REST Catalog table path, skip those probes explicitly:

python3 scripts/table-catalog/pyiceberg_smoke.py \
  --skip-catalog-api-probes

Machine-Readable Inventories

The script can print the current conformance inventories without importing PyIceberg, PyArrow, or boto3:

python3 scripts/table-catalog/pyiceberg_smoke.py --print-client-matrix
python3 scripts/table-catalog/pyiceberg_smoke.py --print-vendor-profiles
python3 scripts/table-catalog/pyiceberg_smoke.py --print-unsupported-inventory
python3 scripts/table-catalog/pyiceberg_smoke.py --print-production-readiness

Use these outputs when updating release notes, PR descriptions, or follow-up work items. They are intentionally conservative: only PyIceberg is automated by this script today; other engines are documented until a repeatable harness is added.

The smoke test also probes catalog-backed advanced Iceberg surfaces:

table refs can be listed, created or replaced, and deleted through catalog commits; refs with explicit retention policy require a forced delete, and main cannot be deleted
Iceberg views support basic create, list, load, replace, existence check, and drop routes with persisted view metadata and view-scoped authorization
metadata maintenance supports safe dry-run planning and controlled worker execution checks
catalog diagnostics exposes the table recovery and consistency state used by operators
catalog export and diagnostics expose the current catalog backing manifest, recoverable commit-log WAL state, strong backing migration target, single active writer HA policy, and scale validation matrix

Client Matrix

Client	Current status	Claim
PyIceberg	Automated smoke target	create namespace, create table, append, reload, scan, metadata-location, refs, views, maintenance, diagnostics, optional catalog-vended table credentials with exact-prefix data-plane scope probe
Spark Iceberg REST catalog	Manual-ready	create/load/append/reload should be verified against a running RustFS endpoint
Trino Iceberg REST catalog	Documented, not automated	no write compatibility claim yet
DuckDB Iceberg	Documented, not automated	read-path reference only
Databend	Documented, not automated	S3 data-plane reference only; Iceberg REST catalog integration is not claimed
Snowflake/Open Catalog integrations	Documented, not automated	reference only

Vendor Profile References

Profile	Catalog shape	Signing name	Credential model	RustFS claim
`rustfs`	`{endpoint}/iceberg`	`s3`	static S3 credentials	automated smoke target
`rustfs-compat`	`{endpoint}/_iceberg`	`s3tables` by default	static S3 credentials	compatibility smoke target
`rustfs-vended-credentials`	`{endpoint}/iceberg`	`s3`	catalog-vended table credentials after table creation	automated credential smoke target when server vending is enabled
`aws-s3tables`	`https://s3tables.{region}.amazonaws.com/iceberg`	`s3tables`	AWS IAM/session credentials	reference only
`minio-aistor`	`{endpoint}/_iceberg`	`s3tables`	policy-scoped S3 credentials	reference only
`cloudflare-r2-data-catalog`	catalog URI returned by R2	`s3`	catalog-vended credentials	reference only
`oss-tables`	provider REST endpoint	`s3`	SigV4 S3FileIO credentials	reference only

Unsupported Inventory

Unsupported behavior is documented instead of hidden behind internal errors. The current unsupported inventory is:

credential vending: automated after table bootstrap with exact-prefix validation and a data-plane scope probe; full no-long-term-data-credential bootstrap is not claimed
background maintenance worker: controlled run-once and heartbeat endpoints are registered; continuous in-process scheduling is not claimed
manifest/data reachability cleanup: metadata maintenance reads manifest-list and manifest Avro references, reports manifest/data/delete reachability, and deletes only unreferenced table objects that pass the safety window
snapshot expiration dry-run planning and manual catalog commit: supported through metadata maintenance reports
automatic maintenance scheduling: external scheduler hook supported through the worker run endpoint; built-in periodic scheduling is not claimed
compaction rewrite: controlled run-once support for unpartitioned Parquet binpack through metadata maintenance; built-in periodic scheduling, sort compaction, delete-file rewrite, and row-level compaction are not claimed
row-level delete/update/merge commits: standard catalog commit validates append, overwrite, delete, and replace snapshot manifests for table-warehouse scope, referenced object existence, current-live-file deletes, and stale add/delete conflicts; end-to-end SQL DML client coverage remains a compatibility validation item
external catalog bridges: metadata import/register and operator-supplied metadata pointer sync are supported for Polaris/Glue/DLF/Hive identity boundaries; online vendor SDK polling and policy mirroring are not claimed
multi-table transactions: not a short-term production claim

Credential Boundary

RustFS advertises table credential scope metadata without returning reusable storage secrets by default. loadTable includes the table warehouse prefix in the response config, and the standard credentials endpoint is registered:

GET /v1/{prefix}/namespaces/{namespace}/tables/{table}/credentials

The endpoint returns an empty storage-credentials list unless table catalog credential vending is explicitly enabled. When enabled, RustFS issues temporary table-scoped S3 credentials through the credentials endpoint. Those credentials are constrained to the table warehouse prefix and include a session token and expiration.

The rustfs-vended-credentials profile verifies the client handoff from the catalog principal to the table-scoped temporary credentials. It still uses the configured principal for setup and REST request signing; the vended credentials are first checked against the created table warehouse location, then checked with a direct S3 scope probe, and finally applied to PyIceberg S3 data-plane access after the table has been created.

Enablement is server-side and fail-closed:

RUSTFS_TABLE_CATALOG_CREDENTIAL_VENDING=enabled
RUSTFS_TABLE_CATALOG_CREDENTIAL_TTL_SECONDS=900

The TTL is clamped to the supported short-lived range by the server.

Spark Manual Baseline

Spark validation should use the same RustFS endpoint and warehouse bucket as the PyIceberg smoke test. The exact package version should be recorded in the client matrix after each run.

Minimum configuration shape:

spark.sql.catalog.rustfs=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.rustfs.type=rest
spark.sql.catalog.rustfs.uri=http://127.0.0.1:9000/iceberg
spark.sql.catalog.rustfs.warehouse=rustfs-s3table-smoke
spark.sql.catalog.rustfs.io-impl=org.apache.iceberg.aws.s3.S3FileIO
spark.sql.catalog.rustfs.s3.endpoint=http://127.0.0.1:9000
spark.sql.catalog.rustfs.s3.path-style-access=true
spark.sql.catalog.rustfs.rest.sigv4-enabled=true
spark.sql.catalog.rustfs.rest.signing-name=s3
spark.sql.catalog.rustfs.rest.signing-region=us-east-1

Until Spark is automated, do not claim Spark support beyond a manually verified run with the exact Spark and Iceberg versions recorded.