Enable 2-4 column comparisons with per-variant version/model selection, persisted layout/results, and evaluation actions aligned to output.