Task Singular Vectors: Reducing Task Interference in Model Merging
Mike’s Daily Deep Learning Paper Review: 05.06.25
Today’s review is short and light.
The paper addresses the problem of merging multiple models, each fine-tuned from the same base model on different tasks, into a single model that performs well across all tasks. Each fine-tuned model has undergone its own specific weight changes during training (e.g., via LoRA, though not necessarily).
A common baseline approach is to average the weight deltas and add the result to the base model. However, the authors point out that this naive strategy often performs poorly even when the tasks are similar. To address this, they propose an intuitive method designed to reduce interference between the delta matrices of different tasks.
What’s the approach?
First, they observe that the delta matrices are typically low-rank. They apply Singular Value Decomposition (SVD) to each delta matrix, yielding orthogonal matrices U_i and V_i, and a diagonal matrix D_i of singular values. Then, for each delta matrix, they retain a small number of top singular vectors just as one would in PCA.
In the second step, they aim to decorrelate the update directions across tasks. To do this, they concatenate the U_i’s and V_i’s from all tasks into large matrices and compute whitening (decorrelation) transformations for each. This involves standard linear algebra techniques, including the Moore-Penrose pseudoinverse.
Finally, rather than summing the deltas directly, they apply the whitening transforms and construct a weighted combination of the update matrices. The goal is to reduce interference while preserving task-specific structure.
The method is applied separately to each layer though it’s unclear whether this per-layer treatment is novel.
https://arxiv.org/abs/2412.00081