standard fine-tuning

task A (original) 88%
task B (new) 0%
task A: −50pp degradation

sparse fine-tuning

task A (original) 88%
task B (new) 0%
task A: −6pp (89% less forgetting)

before training on task B