Skip to content

Latest commit

 

History

History
553 lines (478 loc) · 31.3 KB

README_zh.md

File metadata and controls

553 lines (478 loc) · 31.3 KB

 

Generic badge GitHub Workflow Status (branch) Read the Docs
PyPI PyPI - Downloads
PyTorch - Version Python - Version

执简:为预训练模型复用提供快速部署的统一方案

[论文] [代码] [文档]

English | 中文

ZhiJian (执简驭繁) 是一个基于PyTorch框架的模型复用工具包,为再次利用许多基座预训练模型以及已有任务上的已训练模型,充分提取它们蕴含的知识并激发目标任务上的学习,提供了全面且统一的复用方案。

近年来人工智能的蓬勃发展产出了许多开源预训练模型(Pre-Trained Models),例如PyTorch、TensorFlow和HuggingFace Transformers等平台上存储了大量模型资源。模型复用通过适配网络结构、定制学习方式以及优化推理策略来利用这些预训练模型,来进一步加速和强化目标任务上的学习,这将为机器学习社区源源不断地贡献价值。

overview

为了全面而简洁地考虑各种模型复用策略,ZhiJian 将复用方法归类为三个主要模块:构建者微调者,和融合者,它们分别与目标任务部署时模型准备阶段、学习阶段和推理阶段相对应。执简工具包提供的接口和方法包括:

构建者 模块 [点击以展开]

构建者模块包含修改预训练模型以适应目标任务,引入具有任务特定结构的全新可学习参数,同时确定重用预训练模型的某些部分。

    Linear Probing & Partial-k, How transferable are features in deep neural networks? In: NeurIPS'14. [Paper] [Code]
WSFG
    Adapter, Parameter-Efficient Transfer Learning for NLP. In: ICML'19. [Paper] [Code]
WSFG
    Diff Pruning, Parameter-Efficient Transfer Learning with Diff Pruning. In: ACL'21. [Paper] [Code]
WSFG
    LoRA, LoRA: Low-Rank Adaptation of Large Language Models. In: ICLR'22. [Paper] [Code]
WSFG
    Visual Prompt Tuning / Prefix, Visual Prompt Tuning. In: ECCV'22. [Paper] [Code]
WSFG
    Scaling & Shifting, Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning. In: NeurIPS'22. [Paper] [Code]
WSFG
    AdaptFormer, AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In: NeurIPS'22. [Paper] [Code]
WSFG
    BitFit, BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In: ACL'22. [Paper] [Code]
WSFG
    Convpass, Convolutional Bypasses Are Better Vision Transformer Adapters. In: Tech Report 07-2022. [Paper] [Code]
WSFG
    Fact-Tuning, FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer. In: AAAI'23. [Paper] [Code]
WSFG
微调者 模块 [点击以展开]

微调者模块专注于在预训练模型知识的引导下训练目标模型,以加快优化过程,例如通过调整训练目标、优化器或正则化器等方式。

    Knowledge Transfer, NeC4.5: neural ensemble based C4.5. In: IEEE Trans. Knowl. Data Eng. 2004. [Paper] [Code]
WSFG
    FitNet, FitNets: Hints for Thin Deep Nets. In: ICLR'15. [Paper] [Code]
WSFG
    LwF, Learning without Forgetting. In: CVPR'19. [Paper] [Code]
WSFG
    FSP, A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In: CVPR'17. [Paper] [Code]
WSFG
    NST, Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. In: CVPR'17. [Paper] [Code]
WSFG
    RKD, Relational Knowledge Distillation. In: CVPR'19. [Paper] [Code]
WSFG
    SPKD, Similarity-Preserving Knowledge Distillation. In: CVPR'19. [Paper] [Code]
WSFG
    CRD, Contrastive Representation Distillation. In: ICLR'20. [Paper] [Code]
WSFG
    REFILLED, Distilling Cross-Task Knowledge via Relationship Matching. In: CVPR'20. [Paper] [Code]
WSFG
    WiSE-FT, Robust fine-tuning of zero-shot models. In: CVPR'22. [Paper] [Code]
WSFG
    L2 penalty / L2-SP, Explicit Inductive Bias for Transfer Learning with Convolutional Networks. In: ICML'18. [Paper] [Code]
WSFG
    Spectral Norm, Spectral Normalization for Generative Adversarial Networks. In: ICLR'18. [Paper] [Code]
WSFG
    BSS, Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning. In: NeurIPS'19. [Paper] [Code]
WSFG
    DELTA, DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks. In: ICLR'19. [Paper] [Code]
WSFG
    DeiT, Training data-efficient image transformers & distillation through attention. In: ICML'21. [Paper] [Code]
WSFG
    DIST, Knowledge Distillation from A Stronger Teacher. In: NeurIPS'22. [Paper] [Code]
WSFG
融合者 模块 [点击以展开]

融合者模块在推理阶段通过复用预训练特征,或融合来自适配后的预训练输出来获得更强的泛化能力。

    Nearest Class Mean, Generalizing to new classes at near-zero cost. In: TPAMI'13. [Paper] [Code]
WSFG
    SimpleShot, SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning. In: CVPR'19. [Paper] [Code]
WSFG
    Head2Toe, Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning. In: ICML'22. [Paper] [Code]
WSFG
    VQT, Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning. In: CVPR'23. [Paper] [Code]
WSFG
    via Optimal Transport, Model Fusion via Optimal Transport. In: NeurIPS'20. [Paper] [Code]
WSFG
    Model Soup Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In: ICML'22. [Paper] [Code]
WSFG
    Fisher Merging Merging Models with Fisher-Weighted Averaging. In: NeurIPS'22. [Paper] [Code]
WSFG
    Deep Model Reassembly Deep Model Reassembly. In: NeurIPS'22. [Paper] [Code]
WSFG
    REPAIR REPAIR: REnormalizing Permuted Activations for Interpolation Repair. In: ICLR'23. [Paper] [Code]
WSFG
    Git Re-Basin Git Re-Basin: Merging Models modulo Permutation Symmetries. In: ICLR'23. [Paper] [Code]
WSFG
    ZipIt ZipIt! Merging Models from Different Tasks without Training. In: ICLR'23. [Paper] [Code]
WSFG

💡 ZhiJian 还包含如下特色:

  • 支持复用许多开源预训练模型库, 包含:
  • 极简上手与个性化定制:
    • 在10分钟内极速开始 Open In Colab
    • 一步步地定制数据集和预训练模型 Open In Colab
    • 随心所欲地创造模型复用的新方法 Open In Colab
  • 简洁的结构做得大事儿
    • 基础代码仅约5k行,引入方法就像搭积木一样
    • 在 VTAB-M 基线标准上的最优性能,超过 10k 次实验 [here]
    • 支持用户友好的指引和全面的文档来定制化数据集和预训练模型 [here]

“执简驭繁”的意思是用简洁高效的方法驾驭纷繁复杂的事物。“繁”表示现有预训练模型和复用方法种类多、差异大、部署难,所以取名"执简"的意思是通过该工具包,能轻松地驾驭模型复用方法,易上手、快复用、稳精度,最大限度地唤醒预训练模型的知识。

 

🕹️ 快速开始

  1. conda, venv, 或 virtualenv 部署 Python 3.7+ 的环境

  2. 使用 pip 来安装 ZhiJian:

    $ pip install zhijian
    • [Option] Install with the newest version through GitHub:
      $ pip install git+https://github.com/ZhangYikaii/LAMDA-ZhiJian.git@main --upgrade
  3. 打开 Python 控制台,输入

    import zhijian
    print(zhijian.__version__)

    如果没有错误出现,则成功安装了执简工具包

 

文档

📚 相关教程和API文档请点击 ZhiJian.readthedocs.io

 

为什么使用执简工具包?

architecture

Related Library GitHub Stars # of Alg.(1) # of Model(1) # of Dataset(1) # of Fields(2) LLM Supp. Docs. Last Update
PEFT GitHub stars 6 ~15 (3) 1(a) ✔️ ✔️ GitHub last commit
adapter-transformers GitHub stars 10 ~15 (3) 1(a) ✔️ GitHub last commit
LLaMA-Efficient-Tuning GitHub stars 4 5 ~20 1(a) ✔️ GitHub last commit
Knowledge-Distillation-Zoo GitHub stars 20 2 2 1(b) GitHub last commit
Easy Few-Shot Learning GitHub stars 10 3 2 1(b) GitHub last commit
Model soups GitHub stars 3 3 5 1(c) GitHub last commit
Git Re-Basin GitHub stars 3 5 4 1(c) GitHub last commit
ZhiJian 🙌 30+ ~50 19 3(a,b,c) ✔️ ✔️ GitHub last commit

(1): 更新日期: 2023-08-05 (2): 涉及领域:(a) 构建者模块;(b) 微调者模块;(c) 融合者模块;

📦 复现 SoTA 结果

ZhiJian 固定了随机种子,以确保复现结果,仅在不同设备间存在微小不同

 

如何贡献

ZhiJian 目前正在积极地开发中,欢迎任何形式的贡献。无论您是否对预训练模型、目标数据或创新的重用方法有哪些见解,我们都热切地邀请您加入我们,共同使 ZhiJian 变得更加优秀。如果您希望提交宝贵的贡献,请点击 这里

 

引用 ZhiJian

@misc{zhang2023zhijian,
  title={ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse}, 
  author={Yi-Kai Zhang and Lu Ren and Chao Yi and Qi-Wei Wang and De-Chuan Zhan and Han-Jia Ye},
  year={2023},
  eprint={2308.09158},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

@misc{zhijian2023,
  author = {ZhiJian Contributors},
  title = {LAMDA-ZhiJian},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/zhangyikaii/LAMDA-ZhiJian}}
}