Don't (Just) Train For Performance

Jacob Mitchell Springer   research training adaptability

tl;dr: Extended pre-training improves base model performance but can hurt adaptability—the ability to fine-tune effectively while retaining capabilities. This calls for research explore methods to explicitly pre-train for adaptability.

Weights could be helpful for interpretation

Ziqian Zhong   research interpretability

tl;dr: We demonstrate how analyzing weight differences between pre- and post-fine-tuned models can detect backdoors and suspicious behavior without needing trigger examples.