You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From my attempts, training the old parameters simultaneously results in slightly better final performance, but the training becomes more unstable, and the loss is prone to exploding.
I intuitively believe that Tokenformer has incremental capabilities because the parameters learned in the past can be retained and thus past knowledge can be preserved. Now, if the old parameters are updated at the same time on a new task, I'm having trouble understanding how it can scale the model size without losing capability. Can you explain the confusion further, please?
Hi, are the old parameters fixed when doing incremental training?
The text was updated successfully, but these errors were encountered: