mirror of
https://github.com/shiyu-coder/Kronos.git
synced 2026-06-20 16:16:04 +08:00
Without releasing cached GPU memory, usage will keep growing during autoregressive prediction, leading to significant memory increase or OOM. Calling torch.cuda.empty_cache() prevents this accumulation.