This repo is an attempt to reproduce the results of Anthropic's paper on Constitutional AI. The paper can be found here. In particular, I am using the Hugging Face method described here.
In short I will attempt the following:
- Create a dataset using Mistral-7B-Instruct-v0.1 from some of Anthropics Red teaming prompts
- Fine-tune the model on this dataset
- Evaluate the model on its ability to generate text that is aligned with the constitution
I'm going to attempt to do as much in possible in Typescript, as I think it is a wholly superior language to Python. 😜