Skip to content

vanbujm/con-ai

Repository files navigation

Constitutional AI

This repo is an attempt to reproduce the results of Anthropic's paper on Constitutional AI. The paper can be found here. In particular, I am using the Hugging Face method described here.

In short I will attempt the following:

  • Create a dataset using Mistral-7B-Instruct-v0.1 from some of Anthropics Red teaming prompts
  • Fine-tune the model on this dataset
  • Evaluate the model on its ability to generate text that is aligned with the constitution

I'm going to attempt to do as much in possible in Typescript, as I think it is a wholly superior language to Python. 😜