Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you open source the training data? #22

Open
vinay-k12 opened this issue Nov 23, 2023 · 1 comment
Open

Can you open source the training data? #22

vinay-k12 opened this issue Nov 23, 2023 · 1 comment

Comments

@vinay-k12
Copy link

Hi, I'm trying to replicate the experiment but couldn't exactly match the training data used in the paper. Can you opensource the data sets as well?

@chaoyi-wu
Copy link
Owner

chaoyi-wu commented Nov 24, 2023

Hello, the books we used are listed here, https://github.com/chaoyi-wu/PMC-LLaMA/blob/main/MedicalBook.xlsx. Because of the license, I cannot share the exact contents with you, you may collect them online. The other parts for training can be get from the following link:

  1. Papers: https://github.com/allenai/s2orc
  2. Instruction data: https://huggingface.co/datasets/axiong/pmc_llama_instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants