You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to implement your code in Keras, and achieve the same results as you. I've mimicked LSTM initialization, and checked their math and constants to make it fit yours, although I'm still 20% away from your results (using your input data). My guess is the problem lies in the optimizer (it seems different than in Keras Adadelta), and I don't understand what's happenning in:
gradi = tensor.grad(cost, wrt=tnewp.values())#/bts
grads=[]
l=len(gradi)
for i in range(0,l/2):
gravg=(gradi[i]+gradi[i+l/2])/(4.0)
#print i,i+9
grads.append(gravg)
for i in range(0,len(tnewp.keys())/2):
grads.append(grads[i])
self.f_grad_shared, self.f_update = adadelta(lr, tnewp, grads,emb11,mask11,emb21,mask21,y, cost)
I don't know where to implement this, in Keras logic (I presume this code only runs once, upon network definition), but I've tried:
def get_updates(self, loss, params):
gradi = self.get_gradients(loss, params)
grads = []
l = len(gradi) # for 2 LSTMs, l = 6, 3 'weights' per each
half_l = int(l / 2)
print(half_l)
for i in range(0, half_l):
gravg = (gradi[i] + gradi[i + half_l]) / (4.0)
grads.append(gravg)
alt_half_l = int(len(params) / 2)
print(alt_half_l)
for i in range(0, alt_half_l):
grads.append(grads[i])
shapes = [K.int_shape(p) for p in params]
...
in my own optimizer (based on Keras original Adadelta, again mimicking your constants).
However, the loss/cost per batch went from 0.08 (from a single LSTM, applied to 2 inputs, as suggested by Keras for the Siamese logic) to 0.4, thus it is a logic error.
I guess that gradient manipulation is being applied frequently in my code, such as per batch (as by Keras logic), while in your code it's applied once, on Adadelta initialization/definition.
Hello,
I'm trying to implement your code in Keras, and achieve the same results as you. I've mimicked LSTM initialization, and checked their math and constants to make it fit yours, although I'm still 20% away from your results (using your input data). My guess is the problem lies in the optimizer (it seems different than in Keras Adadelta), and I don't understand what's happenning in:
https://github.com/aditya1503/Siamese-LSTM/blob/master/lstm.py#L289
I don't know where to implement this, in Keras logic (I presume this code only runs once, upon network definition), but I've tried:
in my own optimizer (based on Keras original Adadelta, again mimicking your constants).
However, the loss/cost per batch went from 0.08 (from a single LSTM, applied to 2 inputs, as suggested by Keras for the Siamese logic) to 0.4, thus it is a logic error.
I guess that gradient manipulation is being applied frequently in my code, such as per batch (as by Keras logic), while in your code it's applied once, on Adadelta initialization/definition.
Can someone help me understand what's happening in the above code? what is it for, is it run per batch, and/or why not a single LSTM shared, as Keras suggests in:
https://keras.io/getting-started/functional-api-guide/#shared-layers
Best,
Pedro
The text was updated successfully, but these errors were encountered: