Skip to content

Latest commit

 

History

History
367 lines (325 loc) · 40.3 KB

README.md

File metadata and controls

367 lines (325 loc) · 40.3 KB

Awesome-Knowledge-Fusion

Awesome

If you have any questions about the library, please feel free to contact us. Email: [email protected]


A comprehensive list of papers about '[Knowledge Fusion: A Comprehensive Survey.]'.

Abstract

As the comprehensive capabilities of foundational large models rapidly improve, similar general abilities have emerged across different models, making capability transfer and fusion between them more feasible. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more powerful model through efficient methods such as knowledge distillation, model merging, mixture of experts, and PEFT, thereby reducing the need for costly LLM development and adaptation. We provide a comprehensive overview of model merging methods and theories, covering their applications across various fields and scenarios, including LLMs, MLLMs, image generation, model compression, continual learning, and more. Finally, we highlight the challenges of knowledge fusion and explore future research directions.

survey


Framework


1. Connectivity and Alignment

1.1 Model Connectivity

Paper Title Code Publication & Date
Rethink Model Re-Basin and the Linear Mode Connectivity rethink ArXiv 24.02
Layerwise linear mode connectivity Layerwise ICLR 2024
Proving linear mode connectivity of neural networks via optimal transport OT_LMC AISTATS 2024
Re-basin via implicit Sinkhorn differentiation Re-Basin CVPR 2023
Git Re-Basin: Merging Models modulo Permutation Symmetries Git Re-Basin ICLR 2023
Plateau in Monotonic Linear Interpolation--A "Biased" View of Loss Landscape for Deep Networks - ICLR 2023
Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization - ICLR 2023
Going beyond linear mode connectivity: The layerwise linear feature connectivity LLFC NeurIPS 2023
The role of permutation invariance in linear mode connectivity of neural networks PI ICLR 2022
What can linear interpolation of neural network loss landscapes tell us? - ICML 2022
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling LSS ICML 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes - ICML 2021
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances - ICML 2021
Linear Mode Connectivity and the Lottery Ticket Hypothesis - ICML 2020
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs DNN NeurIPS 2018

1.2 Weight Alignment

Paper Title Code Publication & Date
Equivariant Deep Weight Space Alignment EDWSA ICML 2024
Harmony in diversity: Merging neural networks with canonical correlation analysis CCA Merge ICML 2024
Transformer fusion with optimal transport TF ICLR 2024
ZipIt! Merging Models From Different Tasks Without Training ZipIt ICLR 2024
Training-Free Pretrained Model Merging TFPMM CVPR 2024
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering - ArXiv 24.09
C2M3: Cycle-Consistent Multi Model Merging CCM ArXiv 24.05
REPAIR: REnormalizing Permuted Activations for Interpolation Repair REPAIR ICLR 2023
Optimizing mode connectivity via neuron alignment Neu-Align NeurIPS 2020
Model fusion via optimal transport otfusion NeurIPS 2020
Uniform convergence may be unable to explain generalization in deep learning - NeurIPS 2019
Explaining landscape connectivity of low-cost solutions for multilayer nets - NeurIPS 2019
Essentially no barriers in neural network energy landscape AutoNEB ICML 2018
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging FedExp ArXiv 24.08

2. Parameter Merging

2.1 Merging Methods

Optimization based

Paper Title Code Publication & Date
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging<br / Upcycled Mixture-of-Experts XFT ACL 2024
Model Merging by Uncertainty-Based Gradient Matching code ICLR 2024
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy MC-SMoE ICLR 2024
Representation Surgery for Multi-Task Model Merging Rep-Surgery ICML 2024
Erasure Coded Neural Network Inference via Fisher Averaging ISIT 2024
Fisher Mask Nodes for Language Model Merging Fisher-nodes LREC-COLING 2024
Merging by Matching Models in Task Subspaces Mats TMLR 2024
Soft merging of experts with adaptive routing Smear TMLR 2024
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models SMILE ArXiv 24.08
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts ArXiv 24.06
Checkpoint Merging via Bayesian Optimization in LLM Pretraining ArXiv 24.03
Dataless Knowledge Fusion by Merging Weights of Language Models RegMean ICLR 2023
Merging models with fisher-weighted averaging Fisher NeurIPS 2022
Model fusion via optimal transport Otfusion NeurIPS 2020

Task Vector based

Paper Title Code Publication & Date
Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities ACL 2024
AdaMerging: Adaptive Model Merging for Multi-Task Learning AdaMerging ICLR 2024
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch DARE ICML 2024
Localizing Task Information for Improved Model Merging and Compression Tall_masks ICML 2024
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts WEMoE ICML 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization Phatgoose ICML 2024
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models ICML 2024
Parameter Competition Balancing for Model Merging PCB-Merging NeurIPS 2024
EMR-Merging: Tuning-Free High-Performance Model Merging EMR_Merging NeurIPS 2024
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic Localize-and-Stitch ArXiv 24.08
Activated Parameter Locating via Causal Intervention for Model Merging ArXiv 24.08
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling aTLAS ArXiv 24.07
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning ArXiv 24.06
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling DELLA ArXiv 24.06
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging Twin-Merging ArXiv 24.06
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic ArXiv 24.06
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion Pareto_set ArXiv 24.06
RE-Adapt: Reverse Engineered Adaptation of Large Language Models ArXiv 24.05
Evolutionary optimization of model merging recipes EvoLLM ArXiv 24.03
DPPA: Pruning Method for Large Language Model to Model Merging DPPA ArXiv 24.03
Editing models with task arithmetic Task_vectors ICLR 2023
Task-Specific Skill Localization in Fine-tuned Language Model Grafting ICML 2023
Composing parameter-efficient modules with arithmetic operation PEM_composition NeurIPS 2023
TIES-MERGING: Resolving Interference When Merging Models TIES-Merging NeurIPS 2023
Model breadcrumbs: Scaling multi-task model merging with sparse masks Breadcrumbs ArXiv 23.12
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion Subspace ArXiv 23.12
Effective and Parameter Efficient Reusing Fine-Tuned Models ArXiv 23.10
Patching open-vocabulary models by interpolating weights Patching NeurIPS 2022

2.2 During or After Training

During Training

Paper Title Code Publication & Date
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging ArXiv 24.07
DEM: Distribution Edited Model for Training with Mixed Data Distributions ArXiv 24.06
Checkpoint Merging via Bayesian Optimization in LLM Pretraining ArXiv 24.03
Warm: On the benefits of weight averaged reward models ICML2024
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning ColD-Fusion ACL 2023
Model ratatouille: Recycling diverse models for out-of-distribution generalization Ratatouille ICML 2023
Early Weight Averaging meets High Learning Rates for LLM Pre-training code NeurIPS_W 2023
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging LAWA NeurIPS_W 2022
Stochastic weight averaging revisited PSWA ArXiv 22.09
Fusing finetuned models for better pretraining ArXiv 22.04
Lookahead optimizer: k steps forward, 1 step back Lookahead NeurIPS 2019
Averaging weights leads to wider optima and better generalization SWA UAI 2018

After Training

Paper Title Code Publication & Date
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better LCSC ArXiv 24.04
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models EACL 2023
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Model-soups ICML 2022
Diverse weight averaging for out-of-distribution generalization Diwa NeurIPS 2022

2.3 For LLMs and MLLMs

For LLMs

Paper Title Code Publication & Date
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities code ArXiv 24.09
FuseChat: Knowledge Fusion of Chat Models FuseChat ArXiv24.08
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement MergeLLM ArXiv 24.08
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling DELLA ArXiv 24.06
Mitigating Social Biases in Language Models through Unlearning code ArXiv 24.06
Weak-to-strong extrapolation expedites alignment Expo ArXiv 24.04
Parameter Competition Balancing for Model Merging PCB-Merging NeurIPS 2024
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic Resta ArXiv 24.02
Towards Safer Large Language Models through Machine Unlearning SKU ACL 2024
Lm-cocktail: Resilient tuning of language models via model merging ACL Findings 2024
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch DARE ICML 2024
Controlled Text Generation via Language Model Arithmetic code ICML 2024
Strong Copyright Protection for Language Models via Adaptive Model Fusion ICML 2024
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models ICML2024
Knowledge fusion of large language models FuseLLM ICLR 2024
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation Ext-Sub AAAI 2024
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition lorahub COLM 2024
Composing parameter-efficient modules with arithmetic operation PEM_Composition NeurIPS 2023
Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Rewarded-Soups NeurIPS 2023

For Multimodal Language Models

Paper Title Code Publication & Date
Model Composition for Multimodal Large Language Models ModelCompose ACL 2024
Jointly training large autoregressive multimodal models code ICLR 2024
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification ICASSP_W 2024
An Empirical Study of Multimodal Model Merging Vl-merging EMNLP 2023
[π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation](https://arxiv.org/pdf/2404.02241) π-Tuning ICML 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks UnIVAL TMLR 2023

3. Model Ensemble

3.1 Ensemble Methods

Weighted Averaging

Routing

Paper Title Code Publication & Date
Soft merging of experts with adaptive routing TMLR 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models DeepSeekMoE ArXiv 24.01
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification MEB-Net ECCV 2020
Merging Vision Transformers from Different Tasks and Domains ArXiv 23.12

3.2 Ensemble Object

Entire Model

Paper Title Code Publication & Date
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM ChaiML ArXiv 24.01
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion LLM-Blender ACL 2023
Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning GAME ICML 2022
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning ICLR 2020
Diverse Ensemble Evolution: Curriculum Data-Model Marriage NeurIPS 2018

Adapter

Paper Title Code Publication & Date
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models SMILE ArXiv 24.08
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts ArXiv 24.06
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion ArXiv 24.06
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging Twin-Merging NeurIPS 2024
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM ArXiv 24.03
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts WEMoE ICML 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization ICML 2024
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy ICLR 2024
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models' Memories code ACL 2023

4. Decouple

4.1 Reprogramming

Paper Title Code Publication & Date
Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning AAAI 2024
Towards Efficient Task-Driven Model Reprogramming with Foundation Models ArXiv 23.06
Deep Graph Reprogramming ycjing CVPR 2023
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition ICASSP 2023
Fairness Reprogramming USBC-NLP NeurIPS 2022
Voice2Series: Reprogramming Acoustic Models for Time Series Classification V2S ICML 2021

4.2 Mask

Paper Title Code Publication & Date
EMR-Merging: Tuning-Free High-Performance Model Merging EMR_Merging NeurIPS 2024 spolight
Model Composition for Multimodal Large Language Models THUNLP ACL 2024
Localizing Task Information for Improved Model Merging and Compression tall_masks ICML 2024
Adapting a Single Network to Multiple Tasks by Learning to Mask Weights Piggyback ECCV 2018

5. Distillation

5.1 Transformer

Paper Title Code Publication & Date
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report FuseChat ArXiv 24.02
Sam-clip: Merging vision foundation models towards semantic and spatial understanding CVPR 2024
Knowledge fusion of large language models FuseAI ICLR 2024
Seeking Neural Nuggets: Knowledge Transfer In Large Language Models From A Parametric Perspective ParaKnowTransfer ICLR 2024
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation OFAKD NeurIPS 2023
Knowledge Amalgamation for Object Detection With Transformers TIP 2023

5.2 CNN

Paper Title Code Publication & Date
Factorizing Knowledge in Neural Networks KnowledgeFactor ECCV 2022
Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework Spatial_Ensemble NeurIPS 2021
Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning ECCV 2020
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification MEB-Net ECCV 2020
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN CVPR 2020
Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation ICCV 2019
Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers IJCAI 2019
Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning code IJCAI 2019
Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More KAmalEngine CVPR 2019
Amalgamating Knowledge towards Comprehensive Classification AAAI 2019

5.3 GNN

Paper Title Code Publication & Date
Amalgamating Knowledge From Heterogeneous Graph Neural Networks ycjing CVPR 2021

6. Model Reuse

6.1 Model Reassembly

Paper Title Code Publication & Date
Advances in Robust Federated Learning: Heterogeneity Considerations ArXiv 24.05
Towards Personalized Federated Learning via Heterogeneous Model Reassembly pFedHR NeurIPS 2023
Stitchable Neural Networks snnet CVPR 2023 Highlight
Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models instant_soup ICML 2023
Deep Incubation: Training Large Models by Divide-and-Conquering Deep-Incubation ArXiv 22.12
Deep Model Reassembly DeRy NeurIPS 2022
GAN Cocktail: Mixing GANs without Dataset Access GAN-cocktail ECCV 2022

6.2 Model Evolution

Paper Title Code Publication & Date
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization - ArXiv 24.07
Evolutionary Optimization of Model Merging Recipes EvoLLM ArXiv 24.03
Knowledge Fusion By Evolving Weights of Language Models Model_Evolver ACL 2024
Population-based evolutionary gaming for unsupervised person re-identification - IJCV 2023

7. Others

7.1 External Data Retrieval

Paper Title Code Publication & Date
Evaluating the External and Parametric Knowledge Fusion of Large Language Models ArXiv 24.05
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering ArXiv 20.04

7.2 Multi-Objective Optimization

Paper Title Code Publication & Date
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging ArXiv 24.08
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion ArXiv 24.06
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation ArXiv 24.06

7.3 Others

Paper Title Code Publication & Date
Adaptive Discovering and Merging for Incremental Novel Class Discovery AAAI 2024
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering ArXiv 20.04

7.4 Other Surveys

Paper Title Code Publication & Date
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning ArXiv 24.08
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities Yang ArXiv 24.08
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models ArXiv 24.07
Arcee's MergeKit: A Toolkit for Merging Large Language Models MergeKit ArXiv 24.03
Learn From Model Beyond Fine-Tuning: A Survey LFM ArXiv 23.10
Deep Model Fusion: A Survey ArXiv 23.09
A curated paper list of Model Merging methods ycjing GitHub

Contributors

Junlin Lee; Qi Tang; Runhua Jiang.

Star History

Star History Chart


Contact

We invite all researchers to contribute to this repository, 'Knowledge Fusion: The Integration of Model Capabilities'. If you have any questions about the library, please feel free to contact us.

Email: [email protected]