You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find two errors in the code for counting duration in file analyzer.py:
In PyTorch, the execution of the program is asynchronous. If we use the following code to record the start and end time, the duration will be very short, because the end time is recorded without waiting for the GPU to complete the computation.
If a module in CNN passes forward propagation multiple times, according to the following code, only the duration of the last forward propagation will be recorded, not the duration of each forward propagation.
# tensorwatch\tensorwatch\model_graph\torchstat\analyzer.pyclassModuleStats:
def__init__(self, name) ->None:
# self.duration = 0.0self.duration= []
def_forward_pre_hook(module_stats:ModuleStats, module:nn.Module, input):
assertnotmodule_stats.donetorch.cuda.synchronize()
module_stats.start_time=time.time()
def_forward_post_hook(module_stats:ModuleStats, module:nn.Module, input, output):
assertnotmodule_stats.donetorch.cuda.synchronize()
module_stats.end_time=time.time()
# Using a list to store the duration of each forward propagation.# module_stats.duration = module_stats.end_time-module_stats.start_timemodule_stats.duration.append(module_stats.end_time-module_stats.start_time)
# other code
I also provide a simple comparison result. In the Bottleneck of the ResNet backbone, the same relu function will be called three times, so there will be three corresponding durations. But in the TensorWatch statistics, we can only see one record of relu in the Bottleneck.
I find two errors in the code for counting duration in file analyzer.py:
tensorwatch/tensorwatch/model_graph/torchstat/analyzer.py
Line 96 in 142f83a
tensorwatch/tensorwatch/model_graph/torchstat/analyzer.py
Line 101 in 142f83a
tensorwatch/tensorwatch/model_graph/torchstat/analyzer.py
Line 102 in 142f83a
Here is my solution:
I also provide a simple comparison result. In the Bottleneck of the ResNet backbone, the same
relu
function will be called three times, so there will be three corresponding durations. But in the TensorWatch statistics, we can only see one record ofrelu
in the Bottleneck.https://github.com/open-mmlab/mmdetection/blob/f07de13b82b746dde558202f720ec2225f276d73/mmdet/models/backbones/resnet.py#L260-L299
But using my modified code, we can see that the duration of the three calls to the
relu
function are all recorded.The text was updated successfully, but these errors were encountered: