You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is to reconsider the work described in this paper, which focused on using MCAT multiple-choice questions for training and performance evaluation of ROBOKOP and several other Translator "Reasoners" during Phase I of the Translator program.
We used Khan Academy for the work reported in the paper. I had also looked into Kaplan. Then, there's the USMLE Step 1 questions, which proved to be a bit too challenging at the time and probably still is.
This morning, I stumbled on MedQA and MedMCQA, which look interesting, although I have not conducted a deep dive. MedQA is based on USMLE questions, so that may be too challenging. MedMCQA is based on Indian medical school entrance exams, which I am not familiar with, but they should be similar to MCAT questions.
Anyway, the main point in this issue is to consider whether we should consider using MCAT, USMLE, and/or other "ground truth" medical testing datasets for benchmarking.
The text was updated successfully, but these errors were encountered:
This issue is to reconsider the work described in this paper, which focused on using MCAT multiple-choice questions for training and performance evaluation of ROBOKOP and several other Translator "Reasoners" during Phase I of the Translator program.
We used Khan Academy for the work reported in the paper. I had also looked into Kaplan. Then, there's the USMLE Step 1 questions, which proved to be a bit too challenging at the time and probably still is.
This morning, I stumbled on MedQA and MedMCQA, which look interesting, although I have not conducted a deep dive. MedQA is based on USMLE questions, so that may be too challenging. MedMCQA is based on Indian medical school entrance exams, which I am not familiar with, but they should be similar to MCAT questions.
Anyway, the main point in this issue is to consider whether we should consider using MCAT, USMLE, and/or other "ground truth" medical testing datasets for benchmarking.
The text was updated successfully, but these errors were encountered: