You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to reproduce the BERT baseline from the SuperGLUE paper on the COPA task using just the transformers library. As I understand, the baseline from the paper is implemented using jiant, so I'd like check if I get the details of your implementation right.
So each example in the task consists of:
- idx self-explanatory
- premise self-explanatory as well
- question one of two possibilities: cause and effect in the SuperGLUE version, it's converted back to the original "What was the CAUSE of this?" and "What happened as a RESULT?"
- choice1 the first reply to choose from
- choice2 the second reply to choose from
- label 0 for the first choice, 1 for the second one
The baseline implementation concatenates the premise and the extended question (separated by a space symbol) into a single string, so SEP or </s> tokens are not meant to be there. This is the first "sentence" for sequence classification. The second "sentence" is one of two choices. Effectively, this doubles the size of the dataset. The model is trained on these examples independently and is expected to predict a single scalar value for each one. On inference the pairs of examples are compared, the example with the highest corresponding value is chosen as the answer.
It's not entirely clear to me what values labels are supposed to have to evaluate the loss. Just 0s and 1s?
Also, I'd like to confirm that my description of the training process is indeed accurate.
The text was updated successfully, but these errors were encountered:
Hi,
I'm trying to reproduce the BERT baseline from the SuperGLUE paper on the COPA task using just the
transformers
library. As I understand, the baseline from the paper is implemented usingjiant
, so I'd like check if I get the details of your implementation right.So each example in the task consists of:
-
idx
self-explanatory-
premise
self-explanatory as well-
question
one of two possibilities:cause
andeffect
in the SuperGLUE version, it's converted back to the original "What was the CAUSE of this?" and "What happened as a RESULT?"-
choice1
the first reply to choose from-
choice2
the second reply to choose from-
label
0 for the first choice, 1 for the second oneThe baseline implementation concatenates the premise and the extended question (separated by a space symbol) into a single string, so
SEP
or</s>
tokens are not meant to be there. This is the first "sentence" for sequence classification. The second "sentence" is one of two choices. Effectively, this doubles the size of the dataset. The model is trained on these examples independently and is expected to predict a single scalar value for each one. On inference the pairs of examples are compared, the example with the highest corresponding value is chosen as the answer.It's not entirely clear to me what values labels are supposed to have to evaluate the loss. Just 0s and 1s?
Also, I'd like to confirm that my description of the training process is indeed accurate.
The text was updated successfully, but these errors were encountered: