BasicVSR++: reproduce ntire decompression results on track3 #1216
-
Hi, After testing, the Eval-PSNR is 30.0519, and the Eval-lq_PSNR is 28.3367. So I just gain 1.71dB improvements on track3. I set the num_input_frame as length of each sequence to use the full video sequence as inputs when I test. Can you give some advice? |
Beta Was this translation helpful? Give feedback.
Replies: 17 comments
-
We also used ensemble to further boost the performance. I am going to implement the ensemble in the following days (or weeks). |
Beta Was this translation helpful? Give feedback.
-
Out of curiosity, can someone explain to me, what "ensemble" in this context means? |
Beta Was this translation helpful? Give feedback.
-
Here ensemble means flipping and rotating the images spatially. After rotating and flipping, you should have 8 copies of the original sequence. Then we do inference 8 times and take the average of the outputs. |
Beta Was this translation helpful? Give feedback.
-
Ah interesting, so 4 copies per 90 degree rotation, unmirrored+mirrored. Is the average a simple per pixel calculation or a more complex calculation? Currently working on a heavily compressed lowres clip and my current steps for a quite good result go as: Anyway, the ensemble route looks interesting. Looking at the compute time for my 1080ti as is....oh it's going to cry^^" |
Beta Was this translation helpful? Give feedback.
-
That is quite a lot of steps. I think there could be some better ways to go, but that would require more explorations. |
Beta Was this translation helpful? Give feedback.
-
Yup, but the ensemble way doesn't sound like less steps^^ Currently experimenting with an ai based deblock prepass. Seemed to improve it even more, but seemed to have "smoothed" some things out a bit. Something I need to explore more. EDIT: And now I'm thinking if it's worth training my own model with the Vimeo 90K dataset but using cinepak as degradationprocess to provide the lr images oO. Though, my assumption is this is going to take ages on my 1080ti, if the memory is even enough |
Beta Was this translation helpful? Give feedback.
-
I think Vimeo-90K is not a very good dataset if you want to use recurrent networks, since it contains only 7 frames for each sequence. |
Beta Was this translation helpful? Give feedback.
-
Oh okay. Do you have any suggestions what is a better fit? |
Beta Was this translation helpful? Give feedback.
-
If you can construct the "low quality" videos by yourself, you can consider using the REDS dataset. It contains 100 or 500 high-quality frames per sequence, depending on which version you use. |
Beta Was this translation helpful? Give feedback.
-
Yup, I can (and even need to) do it. Thanks :) |
Beta Was this translation helpful? Give feedback.
-
Thanks for your great work. |
Beta Was this translation helpful? Give feedback.
-
The model in MMEditing is currently without ensemble. The code for ensemble is still in PR, we can do a comparison afterwards. About your second question, I am not quite sure what do you mean. |
Beta Was this translation helpful? Give feedback.
-
I mean the model is less effective for compressed video in reality practice than those low resolution but clear videos. Try different sets of down-than-up scaling may help but is there any blind ways to improve effectiveness? |
Beta Was this translation helpful? Give feedback.
-
I assume that is due to how this model was pretrained. Training your own version on specific compressionmethods may give better results for specific cases. |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm reproducing your paper, so could you release the ensemble test code? |
Beta Was this translation helpful? Give feedback.
-
Hello, you can refer to #585. The PR will be merged after review. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot. You do me a great favor. |
Beta Was this translation helpful? Give feedback.
Hello, you can refer to #585. The PR will be merged after review.