-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new inputs to pre-trained layers #421
Comments
The name of The way the output of So what we want:
You should add that to
Just check that. I think not.
Why not? Also, allowing sth which was not allowed before can not possible break anything.
That's way too ugly. You should not depend on assuming a specific order of the axes for your heuristic to work. (Yes, you can already now create strange edge cases where it will break depending on the order, but ignore those. Do not make it explicitly depending on such behavior.) I would add sth like this to the first loop:
And in the second loop, in the
Also change
I'm quite sure this would break other cases. Maybe instead of dict[int,int], it could be dict[int,set[int]]. However, I also would try to not make this too complicated. Note that there will always be cases which this does not fully cover, no matter what you do. I think it's fine now to add one or two more heuristics to cover your case but we should not make it too complicated. Maybe there is a better way how to infer the |
Is this resolved? Note that we also don't really need to invest so much energy into making the heuristic work correct in all cases (which is anyway not possible). We could maybe instead also just invest some energy into making it the right way so that it always correct, in a clean and predictable way, without relying on such a heuristic. This should certainly be possible, right? E.g. the problem this heuristic tries to solve is, to recover the old shape split information, which is not available at this point anymore. Maybe we could just store this in the checkpoint or somewhere else? |
Problem Statement
I would like to extend an already existing layer with an additional input. In my example I have trained an attention-based encoder-decoder model and now I would like to add an external LM to the inputs of the Softmax layer:
My returnn config for training the checkpoint was:
Now I continue training with this extended layer while importing the existing checkpoint with the
preload_from_files
mechanics:First Attempt
I am on commit face0c3 wich I extended with the changes introduced in #412
The relevant variables in
returnn/tf/util/basic.py
functiontransform_param_axes_split_info_to_new_shape
areHere is the Stack Trace I get
Ok, so in the loop of
returnn/returnn/tf/util/basic.py
Line 155 in face0c3
parts = [10025, 500]
hits neither condition and only the secondparts = [10025]
setsdim_diff = {10025: 10025}
so that we end up inreturnn/returnn/tf/util/basic.py
Line 167 in face0c3
and then the heuristics fail.
Second Attempt
Then I added my case to the end of the loop of
returnn/returnn/tf/util/basic.py
Line 155 in face0c3
Now I get the following variables
And here is the new Stack Trace
Now the first
parts = [10025, 500]
hits my newly defined third condition and setsdim_diff = {10025: 0, 500: 500}
, then the secondparts = [10025]
overwrites it todim_diff = {10025: 10025, 500: 500}
.new_parts
for the first dim should have been[0, 500]
but due to the overwriting it is now[10025. 500]
and we triggerreturnn/returnn/tf/util/basic.py
Line 175 in face0c3
new_parts = [0, 500]
.Now only the assertion
new_parts[0] > 0
fails as expected.Questions
new_parts[0] == 0
? For me yes, the logic incopy_with_new_split_axes
will work as I expect, but maybetransform_param_axes_split_info_to_new_shape
is used elsewhere.dim_diff
but to infer the new shapes in order as we progress through the lists? This would alleviate the issues with square layers.The text was updated successfully, but these errors were encountered: