You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently having a problem while training the GPT1-like model. Even though I choose to train this model(~1.5M parameters) on a very small dataset(5 samples only). This model is unable to overfit the data.
After a few times of training, I think that the model tends to look for some start of each sample only and ignore the remaining ones. For example, in this tensor
it seems like the model uses the first 2 numbers of the tensor to calculate(0 and 88 only), which makes it unable to classify the difference between 2 sentences starting the same way.
Is there anything I don't know about and how can I get over this problem?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
I am currently having a problem while training the GPT1-like model. Even though I choose to train this model(~1.5M parameters) on a very small dataset(5 samples only). This model is unable to overfit the data.
This is the model I created:
I tried to train this model on my own data with this function:
This is an example of input tensors:
and ground truth tensors:
After a few times of training, I think that the model tends to look for some start of each sample only and ignore the remaining ones. For example, in this tensor
it seems like the model uses the first 2 numbers of the tensor to calculate(0 and 88 only), which makes it unable to classify the difference between 2 sentences starting the same way.
Is there anything I don't know about and how can I get over this problem?
Thanks in advance.
The text was updated successfully, but these errors were encountered: