Jacob Mitchell Springer•researchtrainingadaptability
tl;dr: Extended pre-training improves base model performance but can hurt adaptability—the ability to fine-tune effectively while retaining capabilities. This calls for research explore methods to explicitly pre-train for adaptability.
tl;dr: We demonstrate how analyzing weight differences between pre- and post-fine-tuned models can detect backdoors and suspicious behavior without needing trigger examples.