vault backup: 2023-08-20 - 3 files

Affected files: Resources/Finetuning.md stub notes/AGENTS.md stub notes/Mixture of Experts.md
swyxio · Aug 20, 2023 · 31d60c1 · 31d60c1
1 parent 0ea25de
commit 31d60c1
Show file tree

Hide file tree

Showing 3 changed files with 21 additions and 2 deletions.
diff --git a/Resources/Finetuning.md b/Resources/Finetuning.md
@@ -0,0 +1,6 @@
+
+- https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehensive-case-study-for-tailoring-models-to-unique-applications
+
+Code samples
+- https://colab.research.google.com/drive/1mV9sAY4QBKLmS58dpFGHgwCXQKRASR31?usp=sharing#scrollTo=Way3_PuPpIuE
+	- https://github.com/mshumer/gpt-llm-trainer
diff --git a/stub notes/AGENTS.md b/stub notes/AGENTS.md
@@ -46,7 +46,7 @@ Agent Duos: collaborative problem-solving
 Stacking Agents: task delegation & collaboration
 
 
-## agent 
+## major agents projects
 
 - AutoGPT
 	- https://twitter.com/karpathy/status/1642598890573819905
@@ -143,6 +143,12 @@ thread on GPT3 agents https://twitter.com/GrantSlatton/status/160089024345213747
 https://jmcdonnell.substack.com/p/the-near-future-of-ai-is-action-driven?sd=pf # The Near Future of AI is Action-Driven …and it will look a lot like AGI
 
 
+## low code agents
+
+- https://magicloops.dev/
+- https://flowiseai.com/
+- 
+
 ## OSS agent implementations
 
 - Wove https://github.com/zckly/wove long running workflows 

diff --git a/stub notes/Mixture of Experts.md b/stub notes/Mixture of Experts.md
@@ -7,4 +7,11 @@ Switch Transformers
 - https://twitter.com/TheTuringPost/status/1670793833964097537?s=20
 
 TAPAS: 
-- https://twitter.com/TheTuringPost/status/1670793846303645697
+- https://twitter.com/TheTuringPost/status/1670793846303645697
+
+
+https://twitter.com/amanrsanger/status/1690072804161650690?s=20
+- MOE allows models to scale parameter count and performance without scaling inference or training costs. This means I could serve an MOE model significantly faster and cheaper than a quality-equivalent dense model [1]. (2/7)
+- Why is this bad for on-device inference? On-device inference is extremely memory limited. Apple’s M2 Mac has just 24GB of GPU RAM. Even with 4-bit quantization we can barely fit a 48B param model. And that model would see latency of <6 tok/s [2] (3/7)
+- An A100 has 80GB of memory. We can serve a quality-equivalent MOE model with 100B parameters, taking up 50GB of RAM. It would likely cost around $0.2/1M generated token and less than $0.1/1M tokens running at 26 tok/s. If just maximizing speed, we could hit 173 tok/s! [3] (4/7)
+- You can also serve massive MOE models splitting model weights across GPUs. On a single machine, using 640 GB of GPU RAM, you can easily serve >1T parameter MOE model (i.e. near GPT-4 level) with 4-bit quant. And it would cost within a factor of two of serving GPT-3 [