BasicVSR++ Memory footprint during inference #1214

contentis · 2022-01-06T10:01:00Z

contentis
Jan 6, 2022

Is there a way to reduce the memory footprint of the VSR++ model?
The model was trained on 480x270p input with 1080p as the target. Now I want to run inference on old 480p (720x480) footage but I do run into (GPU) memory bottlenecks even when running on an RTX8000 with 48Gb of VRAM. As this is the largest VRAM GPU available to consumers, choosing a GPU with more VRAM is not an option...

Config: basicvsr_plusplus_c64n7_8x1_600k_reds4.py
Run: restoration_video_demo.py

Answered by ckkelvinchan

Jan 9, 2022

Does this also mean that a sliding window framework is not applicable for both basicVSR and plusplus, when working with long sequences ?

Because with patching I am able to fit a 100 frame sequence. But a realistic video length would be far beyond 1000 frames. In theory it should already offload all further frames, as the default CPU cache length is set to 100, but the vram footprint is still increasing with the length of the sequence. Is this supposed to happen or did I just misunderstood something?

BasicVSR and BasicVSR++ are under the recurrent framework, which is different from the sliding window framework, which uses only short-term information.

Do you mean it increases indefinitely…

View full answer

ckkelvinchan · 2022-01-06T11:15:35Z

ckkelvinchan
Jan 6, 2022
Collaborator

Hi @contentis, there are two approaches you can try to "resolve" this issue.

Separate the images into smaller patches.
Separate the sequence into smaller segments.

However, these two approaches could lead to performance drops.

0 replies

contentis · 2022-01-06T11:37:22Z

contentis
Jan 6, 2022
Author

Wouldn't this lead to tiling artefacts for the first suggestion and temporal inconstancy in the second case ?

0 replies

ckkelvinchan · 2022-01-06T11:56:27Z

ckkelvinchan
Jan 6, 2022
Collaborator

Wouldn't this lead to tiling artefacts for the first suggestion and temporal inconstancy in the second case ?

Yes, they could. Both of them are just workarounds and may not be perfect. For the first one, you can try to use larger patches as inputs and center-crop smaller patches to avoid boundary problems. For the second one, you may try overlapping segments and see if it could alleviate the problem a bit.

The large memory footprint is one weakness of bidirectional approaches and we definitely want to improve it.

0 replies

contentis · 2022-01-06T12:19:54Z

contentis
Jan 6, 2022
Author

What about converting it to onnx and reducing the precision to fp16?
And are both passes needed for inference actually or could you reduce the model after training by only include the one-direction pass?

0 replies

ckkelvinchan · 2022-01-07T05:41:12Z

ckkelvinchan
Jan 7, 2022
Collaborator

What about converting it to onnx and reducing the precision to fp16? And are both passes needed for inference actually or could you reduce the model after training by only include the one-direction pass?

Converting to fp16 could save some memory, but if you have too many frames, probably memory is also not enough.

Both passes are required. So we cannot use only one direction during inference.

Have you tried using the argument (cpu_cache_length)? It puts the features in CPU to save memory in GPU. But it slows down your inference due to the conversion between CPU and GPU.

0 replies

contentis · 2022-01-07T12:53:05Z

contentis
Jan 7, 2022
Author

Does this also mean that a sliding window framework is not applicable for both basicVSR and plusplus, when working with long sequences ?

Because with patching I am able to fit a 100 frame sequence. But a realistic video length would be far beyond 1000 frames. In theory it should already offload all further frames, as the default CPU cache length is set to 100, but the vram footprint is still increasing with the length of the sequence.
Is this supposed to happen or did I just misunderstood something?

0 replies

ckkelvinchan · 2022-01-09T13:52:30Z

ckkelvinchan
Jan 9, 2022
Collaborator

Does this also mean that a sliding window framework is not applicable for both basicVSR and plusplus, when working with long sequences ?

Because with patching I am able to fit a 100 frame sequence. But a realistic video length would be far beyond 1000 frames. In theory it should already offload all further frames, as the default CPU cache length is set to 100, but the vram footprint is still increasing with the length of the sequence. Is this supposed to happen or did I just misunderstood something?

BasicVSR and BasicVSR++ are under the recurrent framework, which is different from the sliding window framework, which uses only short-term information.

Do you mean it increases indefinitely? Does that mean the required memory for 500 frames is more than that for 300 frames?

For bidirectional propagation, it is hard to disregard the intermediate features. I think you can try training a unidirectional variant of BasicVSR. In this case, you need only the current features and the features from the previous frames.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BasicVSR++ Memory footprint during inference #1214

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

BasicVSR++ Memory footprint during inference #1214

contentis Jan 6, 2022

Replies: 7 comments

ckkelvinchan Jan 6, 2022 Collaborator

contentis Jan 6, 2022 Author

ckkelvinchan Jan 6, 2022 Collaborator

contentis Jan 6, 2022 Author

ckkelvinchan Jan 7, 2022 Collaborator

contentis Jan 7, 2022 Author

ckkelvinchan Jan 9, 2022 Collaborator

contentis
Jan 6, 2022

ckkelvinchan
Jan 6, 2022
Collaborator

contentis
Jan 6, 2022
Author

ckkelvinchan
Jan 6, 2022
Collaborator

contentis
Jan 6, 2022
Author

ckkelvinchan
Jan 7, 2022
Collaborator

contentis
Jan 7, 2022
Author

ckkelvinchan
Jan 9, 2022
Collaborator