Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support few-shot chain-of-thought in GPQA / MMLU #3088

Open
yifanmai opened this issue Oct 22, 2024 · 0 comments
Open

Support few-shot chain-of-thought in GPQA / MMLU #3088

yifanmai opened this issue Oct 22, 2024 · 0 comments

Comments

@yifanmai
Copy link
Collaborator

Related: #3017 and #3018

Both GPQA and MMLU have a similar way of doing few-shot chain-of-thought so we should reuse the common infrastructure. We should coordinate to not duplicate work.

Scenario

  1. When constructing instances, set instance.extra_data["chain_of_thought"] to the chain of thought in the dataset instance.

Run spec function

  1. Update the run spec function to take in an boolean parameter use_chain_of_thought
  2. Update the function to use the new adapter and metric if and only if use_chain_of_thought is true.

Adapter

  1. Create a subclass of MultipleChoiceJointAdapter that includes the chain of thought in the output.
  2. Add a new enum value (similar to ADAPT_MULTIPLE_CHOICE_JOINT) for that subclass and add the enum value and the subclass to AdapterFactory.
  3. Set the AdapterSpec to use this enum value when constructing the run spec for in the run spec functions.

Metric

  1. Create a new subclass of Metric in a new file chain_of_thought_metrics.py
  2. Override evaluate_generation() to parse the model generated output to look for something like (A) and output a Stat named chain_of_thought_exact_match
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant