Awesome-Knowledge-Fusion

If you have any questions about the library, please feel free to contact us. Email: duguodong7@gmail.com

A comprehensive list of papers about '[Knowledge Fusion: A Comprehensive Survey.]'.

Abstract

As the comprehensive capabilities of foundational large models rapidly improve, similar general abilities have emerged across different models, making capability transfer and fusion between them more feasible. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more powerful model through efficient methods such as knowledge distillation, model merging, mixture of experts, and PEFT, thereby reducing the need for costly LLM development and adaptation. We provide a comprehensive overview of model merging methods and theories, covering their applications across various fields and scenarios, including LLMs, MLLMs, image generation, model compression, continual learning, and more. Finally, we highlight the challenges of knowledge fusion and explore future research directions.

Framework

Awesome-Knowledge-Fuse

1. Connectivity and Alignment

1.1 Model Connectivity

Paper Title	Code	Publication & Date
Rethink Model Re-Basin and the Linear Mode Connectivity	rethink	ArXiv 24.02
Layerwise linear mode connectivity	Layerwise	ICLR 2024
Proving linear mode connectivity of neural networks via optimal transport	OT_LMC	AISTATS 2024
Re-basin via implicit Sinkhorn differentiation	Re-Basin	CVPR 2023
Git Re-Basin: Merging Models modulo Permutation Symmetries	Git Re-Basin	ICLR 2023
Plateau in Monotonic Linear Interpolation--A "Biased" View of Loss Landscape for Deep Networks	-	ICLR 2023
Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization	-	ICLR 2023
Going beyond linear mode connectivity: The layerwise linear feature connectivity	LLFC	NeurIPS 2023
The role of permutation invariance in linear mode connectivity of neural networks	PI	ICLR 2022
What can linear interpolation of neural network loss landscapes tell us?	-	ICML 2022
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling	LSS	ICML 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes	-	ICML 2021
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances	-	ICML 2021
Linear Mode Connectivity and the Lottery Ticket Hypothesis	-	ICML 2020
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs	DNN	NeurIPS 2018

1.2 Weight Alignment

Paper Title	Code	Publication & Date
Equivariant Deep Weight Space Alignment	EDWSA	ICML 2024
Harmony in diversity: Merging neural networks with canonical correlation analysis	CCA Merge	ICML 2024
Transformer fusion with optimal transport	TF	ICLR 2024
ZipIt! Merging Models From Different Tasks Without Training	ZipIt	ICLR 2024
Training-Free Pretrained Model Merging	TFPMM	CVPR 2024
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering	-	ArXiv 24.09
C2M3: Cycle-Consistent Multi Model Merging	CCM	ArXiv 24.05
REPAIR: REnormalizing Permuted Activations for Interpolation Repair	REPAIR	ICLR 2023
Optimizing mode connectivity via neuron alignment	Neu-Align	NeurIPS 2020
Model fusion via optimal transport	otfusion	NeurIPS 2020
Uniform convergence may be unable to explain generalization in deep learning	-	NeurIPS 2019
Explaining landscape connectivity of low-cost solutions for multilayer nets	-	NeurIPS 2019
Essentially no barriers in neural network energy landscape	AutoNEB	ICML 2018
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging	FedExp	ArXiv 24.08

2. Parameter Merging

2.1 Merging Methods

Optimization based

Paper Title	Code	Publication & Date
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging<br / Upcycled Mixture-of-Experts	XFT	ACL 2024
Model Merging by Uncertainty-Based Gradient Matching	code	ICLR 2024
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy	MC-SMoE	ICLR 2024
Representation Surgery for Multi-Task Model Merging	Rep-Surgery	ICML 2024
Erasure Coded Neural Network Inference via Fisher Averaging		ISIT 2024
Fisher Mask Nodes for Language Model Merging	Fisher-nodes	LREC-COLING 2024
Merging by Matching Models in Task Subspaces	Mats	TMLR 2024
Soft merging of experts with adaptive routing	Smear	TMLR 2024
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	SMILE	ArXiv 24.08
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts		ArXiv 24.06
Checkpoint Merging via Bayesian Optimization in LLM Pretraining		ArXiv 24.03
Dataless Knowledge Fusion by Merging Weights of Language Models	RegMean	ICLR 2023
Merging models with fisher-weighted averaging	Fisher	NeurIPS 2022
Model fusion via optimal transport	Otfusion	NeurIPS 2020

Task Vector based

Paper Title	Code	Publication & Date
Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities		ACL 2024
AdaMerging: Adaptive Model Merging for Multi-Task Learning	AdaMerging	ICLR 2024
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	DARE	ICML 2024
Localizing Task Information for Improved Model Merging and Compression	Tall_masks	ICML 2024
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts	WEMoE	ICML 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization	Phatgoose	ICML 2024
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models		ICML 2024
Parameter Competition Balancing for Model Merging	PCB-Merging	NeurIPS 2024
EMR-Merging: Tuning-Free High-Performance Model Merging	EMR_Merging	NeurIPS 2024
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic	Localize-and-Stitch	ArXiv 24.08
Activated Parameter Locating via Causal Intervention for Model Merging		ArXiv 24.08
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling	aTLAS	ArXiv 24.07
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning		ArXiv 24.06
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	DELLA	ArXiv 24.06
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging	Twin-Merging	ArXiv 24.06
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic		ArXiv 24.06
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	Pareto_set	ArXiv 24.06
RE-Adapt: Reverse Engineered Adaptation of Large Language Models		ArXiv 24.05
Evolutionary optimization of model merging recipes	EvoLLM	ArXiv 24.03
DPPA: Pruning Method for Large Language Model to Model Merging	DPPA	ArXiv 24.03
Editing models with task arithmetic	Task_vectors	ICLR 2023
Task-Specific Skill Localization in Fine-tuned Language Model	Grafting	ICML 2023
Composing parameter-efficient modules with arithmetic operation	PEM_composition	NeurIPS 2023
TIES-MERGING: Resolving Interference When Merging Models	TIES-Merging	NeurIPS 2023
Model breadcrumbs: Scaling multi-task model merging with sparse masks	Breadcrumbs	ArXiv 23.12
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion	Subspace	ArXiv 23.12
Effective and Parameter Efficient Reusing Fine-Tuned Models		ArXiv 23.10
Patching open-vocabulary models by interpolating weights	Patching	NeurIPS 2022

2.2 During or After Training

During Training

Paper Title	Code	Publication & Date
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging		ArXiv 24.07
DEM: Distribution Edited Model for Training with Mixed Data Distributions		ArXiv 24.06
Checkpoint Merging via Bayesian Optimization in LLM Pretraining		ArXiv 24.03
Warm: On the benefits of weight averaged reward models		ICML2024
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning	ColD-Fusion	ACL 2023
Model ratatouille: Recycling diverse models for out-of-distribution generalization	Ratatouille	ICML 2023
Early Weight Averaging meets High Learning Rates for LLM Pre-training	code	NeurIPS_W 2023
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging	LAWA	NeurIPS_W 2022
Stochastic weight averaging revisited	PSWA	ArXiv 22.09
Fusing finetuned models for better pretraining		ArXiv 22.04
Lookahead optimizer: k steps forward, 1 step back	Lookahead	NeurIPS 2019
Averaging weights leads to wider optima and better generalization	SWA	UAI 2018

After Training

Paper Title	Code	Publication & Date
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better	LCSC	ArXiv 24.04
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models		EACL 2023
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	Model-soups	ICML 2022
Diverse weight averaging for out-of-distribution generalization	Diwa	NeurIPS 2022

2.3 For LLMs and MLLMs

For LLMs

Paper Title	Code	Publication & Date
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities	code	ArXiv 24.09
FuseChat: Knowledge Fusion of Chat Models	FuseChat	ArXiv24.08
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement	MergeLLM	ArXiv 24.08
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	DELLA	ArXiv 24.06
Mitigating Social Biases in Language Models through Unlearning	code	ArXiv 24.06
Weak-to-strong extrapolation expedites alignment	Expo	ArXiv 24.04
Parameter Competition Balancing for Model Merging	PCB-Merging	NeurIPS 2024
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	Resta	ArXiv 24.02
Towards Safer Large Language Models through Machine Unlearning	SKU	ACL 2024
Lm-cocktail: Resilient tuning of language models via model merging		ACL Findings 2024
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	DARE	ICML 2024
Controlled Text Generation via Language Model Arithmetic	code	ICML 2024
Strong Copyright Protection for Language Models via Adaptive Model Fusion		ICML 2024
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models		ICML2024
Knowledge fusion of large language models	FuseLLM	ICLR 2024
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation	Ext-Sub	AAAI 2024
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition	lorahub	COLM 2024
Composing parameter-efficient modules with arithmetic operation	PEM_Composition	NeurIPS 2023
Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards	Rewarded-Soups	NeurIPS 2023

For Multimodal Language Models

Paper Title	Code	Publication & Date
Model Composition for Multimodal Large Language Models	ModelCompose	ACL 2024
Jointly training large autoregressive multimodal models	code	ICLR 2024
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification		ICASSP_W 2024
An Empirical Study of Multimodal Model Merging	Vl-merging	EMNLP 2023
[π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation](https://arxiv.org/pdf/2404.02241)	π-Tuning	ICML 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks	UnIVAL	TMLR 2023

3. Model Ensemble

3.1 Ensemble Methods

Weighted Averaging

Routing

Paper Title	Code	Publication & Date
Soft merging of experts with adaptive routing		TMLR 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models	DeepSeekMoE	ArXiv 24.01
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification	MEB-Net	ECCV 2020
Merging Vision Transformers from Different Tasks and Domains		ArXiv 23.12

3.2 Ensemble Object

Entire Model

Paper Title	Code	Publication & Date
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM	ChaiML	ArXiv 24.01
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion	LLM-Blender	ACL 2023
Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning	GAME	ICML 2022
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning		ICLR 2020
Diverse Ensemble Evolution: Curriculum Data-Model Marriage		NeurIPS 2018

Adapter

Paper Title	Code	Publication & Date
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	SMILE	ArXiv 24.08
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts		ArXiv 24.06
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion		ArXiv 24.06
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging	Twin-Merging	NeurIPS 2024
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM		ArXiv 24.03
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts	WEMoE	ICML 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization		ICML 2024
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy		ICLR 2024
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models' Memories	code	ACL 2023

4. Decouple

4.1 Reprogramming

Paper Title	Code	Publication & Date
Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning		AAAI 2024
Towards Efficient Task-Driven Model Reprogramming with Foundation Models		ArXiv 23.06
Deep Graph Reprogramming	ycjing	CVPR 2023
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition		ICASSP 2023
Fairness Reprogramming	USBC-NLP	NeurIPS 2022
Voice2Series: Reprogramming Acoustic Models for Time Series Classification	V2S	ICML 2021

4.2 Mask

Paper Title	Code	Publication & Date
EMR-Merging: Tuning-Free High-Performance Model Merging	EMR_Merging	NeurIPS 2024 spolight
Model Composition for Multimodal Large Language Models	THUNLP	ACL 2024
Localizing Task Information for Improved Model Merging and Compression	tall_masks	ICML 2024
Adapting a Single Network to Multiple Tasks by Learning to Mask Weights	Piggyback	ECCV 2018

5. Distillation

5.1 Transformer

Paper Title	Code	Publication & Date
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report	FuseChat	ArXiv 24.02
Sam-clip: Merging vision foundation models towards semantic and spatial understanding		CVPR 2024
Knowledge fusion of large language models	FuseAI	ICLR 2024
Seeking Neural Nuggets: Knowledge Transfer In Large Language Models From A Parametric Perspective	ParaKnowTransfer	ICLR 2024
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation	OFAKD	NeurIPS 2023
Knowledge Amalgamation for Object Detection With Transformers		TIP 2023

5.2 CNN

Paper Title	Code	Publication & Date
Factorizing Knowledge in Neural Networks	KnowledgeFactor	ECCV 2022
Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework	Spatial_Ensemble	NeurIPS 2021
Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning		ECCV 2020
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification	MEB-Net	ECCV 2020
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN		CVPR 2020
Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation		ICCV 2019
Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers		IJCAI 2019
Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning	code	IJCAI 2019
Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More	KAmalEngine	CVPR 2019
Amalgamating Knowledge towards Comprehensive Classification		AAAI 2019

5.3 GNN

Paper Title	Code	Publication & Date
Amalgamating Knowledge From Heterogeneous Graph Neural Networks	ycjing	CVPR 2021

6. Model Reuse

6.1 Model Reassembly

Paper Title	Code	Publication & Date
Advances in Robust Federated Learning: Heterogeneity Considerations		ArXiv 24.05
Towards Personalized Federated Learning via Heterogeneous Model Reassembly	pFedHR	NeurIPS 2023
Stitchable Neural Networks	snnet	CVPR 2023 Highlight
Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models	instant_soup	ICML 2023
Deep Incubation: Training Large Models by Divide-and-Conquering	Deep-Incubation	ArXiv 22.12
Deep Model Reassembly	DeRy	NeurIPS 2022
GAN Cocktail: Mixing GANs without Dataset Access	GAN-cocktail	ECCV 2022

6.2 Model Evolution

Paper Title	Code	Publication & Date
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization	-	ArXiv 24.07
Evolutionary Optimization of Model Merging Recipes	EvoLLM	ArXiv 24.03
Knowledge Fusion By Evolving Weights of Language Models	Model_Evolver	ACL 2024
Population-based evolutionary gaming for unsupervised person re-identification	-	IJCV 2023

7. Others

7.1 External Data Retrieval

Paper Title	Code	Publication & Date
Evaluating the External and Parametric Knowledge Fusion of Large Language Models		ArXiv 24.05
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering		ArXiv 20.04

7.2 Multi-Objective Optimization

Paper Title	Code	Publication & Date
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging		ArXiv 24.08
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion		ArXiv 24.06
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation		ArXiv 24.06

7.3 Others

Paper Title	Code	Publication & Date
Adaptive Discovering and Merging for Incremental Novel Class Discovery		AAAI 2024
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering		ArXiv 20.04

7.4 Other Surveys

Paper Title	Code	Publication & Date
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning		ArXiv 24.08
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities	Yang	ArXiv 24.08
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models		ArXiv 24.07
Arcee's MergeKit: A Toolkit for Merging Large Language Models	MergeKit	ArXiv 24.03
Learn From Model Beyond Fine-Tuning: A Survey	LFM	ArXiv 23.10
Deep Model Fusion: A Survey		ArXiv 23.09
A curated paper list of Model Merging methods	ycjing	GitHub

Contributors

Junlin Lee; Qi Tang; Runhua Jiang.

Star History

Contact

We invite all researchers to contribute to this repository, 'Knowledge Fusion: The Integration of Model Capabilities'. If you have any questions about the library, please feel free to contact us.

Email: duguodong7@gmail.com

Files

README.md

Latest commit

History