You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for your amazing work!
I found the padding size used in activation funtion is actually same as padding=self.act_num, so can i ask why you use padding=(self.act_num*2+1)//2 instead of padding=self.act_num? It seems could both guarded the input size is equal to output size after the DW-conv.
Best,
Yueyi
The text was updated successfully, but these errors were encountered:
Thank for the nice suggestion! We have fixed this.
Thanks for your reply! There is also a question about Series Informed Activation Function. Why the number of stacked activation function can be implemented in act_num in the following code snippet? It seems only enlarge the kernel size and how to achieve the goal of stacking activtion function? Thanks for your patient! self.weight = torch.nn.Parameter(torch.randn(dim, 1, act_num * 2 + 1, act_num * 2 + 1))
Thanks for your question! By enlarging the kernel size in the code, we are effectively aggregating outputs from the activation function at different positions. This essentially achieves the stacking of the activation function. Although it may seem counterintuitive at first, it is simply an implementation strategy to achieve our goal. Please feel free to ask if you have more questions!
Hi,
Thanks for your amazing work!
I found the padding size used in activation funtion is actually same as
padding=self.act_num
, so can i ask why you usepadding=(self.act_num*2+1)//2
instead ofpadding=self.act_num
? It seems could both guarded the input size is equal to output size after the DW-conv.Best,
Yueyi
The text was updated successfully, but these errors were encountered: