huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 11k

Code
Issues 179
Pull requests 57
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 32 Milestones 0

New pull request New

57 Open 1,253 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix device placement for GRPO attention mask in compute_loss

#2747 opened Feb 3, 2025 by tgaddair

Loading…

allow ref_model to be set in trainer to interface parity with other trainers

#2746 opened Feb 3, 2025 by winglian • Draft

5 tasks

Update unsloth_integration.md

#2742 opened Feb 2, 2025 by AdeebAldkheel

Loading…

5 tasks

📖Nit Fix in Documentation

#2740 opened Feb 2, 2025 by ParagEkbote

Loading…

1 task done

feat: Add cliprange to GRPO loss

#2739 opened Feb 2, 2025 by joey00072 • Draft

1 of 5 tasks

feat: Add vLLM dtype configuration for GRPO trainer

#2738 opened Feb 2, 2025 by joey00072

Loading…

5 tasks

Dynamically load LoRA weights when using vLLM

#2730 opened Feb 1, 2025 by tgaddair

Loading…

GRPO: Expose vllm_init_kwargs to enable vllm configuration

#2728 opened Feb 1, 2025 by mirceapricop

Loading…

5 tasks

⚡ Fix GRPO PEFT

#2725 opened Jan 31, 2025 by qgallouedec • Draft

5 tasks

WIP: RLOOV2

#2724 opened Jan 31, 2025 by mnoukhov • Draft

3 tasks

Update ppo_trainer.md documentation

#2720 opened Jan 31, 2025 by JohnConnor123

Loading…

5 tasks

🔁 🦈 Support iterative GRPO

#2700 opened Jan 30, 2025 by shirinyamani

Loading…

4 of 5 tasks

[GRPO] add reward weight in multi-reward settings

#2676 opened Jan 28, 2025 by hesamsheikh

Loading…

1 task

🔧 Optimize GRPO VRAM Usage by Computing Prompt Tokens Just Once

#2669 opened Jan 27, 2025 by andyl98

Loading…

2 of 5 tasks

share parameters between model and ref model

#2668 opened Jan 27, 2025 by GeeeekExplorer

Loading…

2 of 5 tasks

Add Optional ZeRO-3 Weight Gathering for GRPO in Sequence Generation

#2667 opened Jan 27, 2025 by SeungyounShin

Loading…

5 tasks done

Add special token to PRM vocabulary if not present

#2646 opened Jan 24, 2025 by plaguss • Draft

5 tasks

[SFT] add token accuracy metric

#2597 opened Jan 21, 2025 by kashif

Loading…

5 tasks

[Not meant to be merged] Support branch for Trainer refactor

#2594 opened Jan 20, 2025 by qgallouedec • Draft

5 tasks

🐍 Support Python 3.13

#2593 opened Jan 20, 2025 by qgallouedec • Draft

5 tasks

[WIP] [Liger] liger JSD support

#2573 opened Jan 16, 2025 by Mecoli1219 • Draft

5 tasks

Reduce memory consumption when training with PPO

#2571 opened Jan 15, 2025 by summerspringwei

Loading…

5 tasks

[Liger] liger DPO support

#2568 opened Jan 14, 2025 by kashif

Loading…

Add _compute_score method to PPOTrainer

#2560 opened Jan 11, 2025 by oliveiraeliel • Draft

2 of 5 tasks

Add generation caching in TextEnvironment and fix bugs in TextEnvironment

#2556 opened Jan 10, 2025 by konrad-gerlach

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly