Skip to content

Commit

Permalink
vault backup: 2023-08-20 - 3 files
Browse files Browse the repository at this point in the history
Affected files:
Resources/Finetuning.md
stub notes/AGENTS.md
stub notes/Mixture of Experts.md
  • Loading branch information
swyx committed Aug 20, 2023
1 parent 0ea25de commit 31d60c1
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 2 deletions.
6 changes: 6 additions & 0 deletions Resources/Finetuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

- https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehensive-case-study-for-tailoring-models-to-unique-applications

Code samples
- https://colab.research.google.com/drive/1mV9sAY4QBKLmS58dpFGHgwCXQKRASR31?usp=sharing#scrollTo=Way3_PuPpIuE
- https://github.com/mshumer/gpt-llm-trainer
8 changes: 7 additions & 1 deletion stub notes/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Agent Duos: collaborative problem-solving
Stacking Agents: task delegation & collaboration


## agent
## major agents projects

- AutoGPT
- https://twitter.com/karpathy/status/1642598890573819905
Expand Down Expand Up @@ -143,6 +143,12 @@ thread on GPT3 agents https://twitter.com/GrantSlatton/status/160089024345213747
https://jmcdonnell.substack.com/p/the-near-future-of-ai-is-action-driven?sd=pf # The Near Future of AI is Action-Driven …and it will look a lot like AGI


## low code agents

- https://magicloops.dev/
- https://flowiseai.com/
-

## OSS agent implementations

- Wove https://github.com/zckly/wove long running workflows
Expand Down
9 changes: 8 additions & 1 deletion stub notes/Mixture of Experts.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,11 @@ Switch Transformers
- https://twitter.com/TheTuringPost/status/1670793833964097537?s=20

TAPAS:
- https://twitter.com/TheTuringPost/status/1670793846303645697
- https://twitter.com/TheTuringPost/status/1670793846303645697


https://twitter.com/amanrsanger/status/1690072804161650690?s=20
- MOE allows models to scale parameter count and performance without scaling inference or training costs. This means I could serve an MOE model significantly faster and cheaper than a quality-equivalent dense model [1]. (2/7)
- Why is this bad for on-device inference? On-device inference is extremely memory limited. Apple’s M2 Mac has just 24GB of GPU RAM. Even with 4-bit quantization we can barely fit a 48B param model. And that model would see latency of <6 tok/s [2] (3/7)
- An A100 has 80GB of memory. We can serve a quality-equivalent MOE model with 100B parameters, taking up 50GB of RAM. It would likely cost around $0.2/1M generated token and less than $0.1/1M tokens running at 26 tok/s. If just maximizing speed, we could hit 173 tok/s! [3] (4/7)
- You can also serve massive MOE models splitting model weights across GPUs. On a single machine, using 640 GB of GPU RAM, you can easily serve >1T parameter MOE model (i.e. near GPT-4 level) with 4-bit quant. And it would cost within a factor of two of serving GPT-3 [

0 comments on commit 31d60c1

Please sign in to comment.