BasicVSR++ Memory footprint during inference #1214
-
Is there a way to reduce the memory footprint of the VSR++ model? Config: basicvsr_plusplus_c64n7_8x1_600k_reds4.py |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Hi @contentis, there are two approaches you can try to "resolve" this issue.
However, these two approaches could lead to performance drops. |
Beta Was this translation helpful? Give feedback.
-
Wouldn't this lead to tiling artefacts for the first suggestion and temporal inconstancy in the second case ? |
Beta Was this translation helpful? Give feedback.
-
Yes, they could. Both of them are just workarounds and may not be perfect. For the first one, you can try to use larger patches as inputs and center-crop smaller patches to avoid boundary problems. For the second one, you may try overlapping segments and see if it could alleviate the problem a bit. The large memory footprint is one weakness of bidirectional approaches and we definitely want to improve it. |
Beta Was this translation helpful? Give feedback.
-
What about converting it to onnx and reducing the precision to fp16? |
Beta Was this translation helpful? Give feedback.
-
Converting to fp16 could save some memory, but if you have too many frames, probably memory is also not enough. Both passes are required. So we cannot use only one direction during inference. Have you tried using the argument (cpu_cache_length)? It puts the features in CPU to save memory in GPU. But it slows down your inference due to the conversion between CPU and GPU. |
Beta Was this translation helpful? Give feedback.
-
Does this also mean that a sliding window framework is not applicable for both basicVSR and plusplus, when working with long sequences ? Because with patching I am able to fit a 100 frame sequence. But a realistic video length would be far beyond 1000 frames. In theory it should already offload all further frames, as the default CPU cache length is set to 100, but the vram footprint is still increasing with the length of the sequence. |
Beta Was this translation helpful? Give feedback.
-
BasicVSR and BasicVSR++ are under the recurrent framework, which is different from the sliding window framework, which uses only short-term information. Do you mean it increases indefinitely? Does that mean the required memory for 500 frames is more than that for 300 frames? For bidirectional propagation, it is hard to disregard the intermediate features. I think you can try training a unidirectional variant of BasicVSR. In this case, you need only the current features and the features from the previous frames. |
Beta Was this translation helpful? Give feedback.
BasicVSR and BasicVSR++ are under the recurrent framework, which is different from the sliding window framework, which uses only short-term information.
Do you mean it increases indefinitely…