Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating from nvbuf_utils to NvUtils. #125

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

douo
Copy link

@douo douo commented Feb 23, 2023

Refactor the code to migrate from nvbuf_utils to NvUtils, and it also includes fixes for issues #120 #123 and #115 . The changes have been tested in Jetpack 5.1.

@douo
Copy link
Author

douo commented Feb 24, 2023

It's not backwards compatible. I will keep the branch separate for Jetpack 5.x only.

@douo douo closed this Feb 24, 2023
@douo douo reopened this Feb 24, 2023
@douo
Copy link
Author

douo commented Feb 24, 2023

I discovered some performance issue on jetpack 5.1.

devices(MAXN jetson_clocks):

  • AGX Jetpack 5.1 L4T 35.2.1
  • Nano Jetpack 4.6.3 L4T 32.7.3

Command:

  • nvv4l2dec: ffmpeg -y -benchmark -c:v hevc_nvv4l2dec -i $input -f null -
  • gstreamer: gst-launch-1.0 filesrc location=$input ! h265parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
  • 00_video_decode: video_decode H265 --disable-rendering --stats $input
  • jetson_ffmpeg: ffmpeg -y -benchmark -c:v hevc_nvmpi -i $input -f null -

sample_3840x2160.hevc:

Stream #0:0: Video: hevc (Main), yuv420p(tv), 3840x2160, 23.98 fps, 23.98 tbr, 1200k tbn, 23.98 tbc
AGX(fps) Nano(fps)
00_video_decode 32.7 97.96
gstreamer 232.61 97.71
nvv4l2dec 24 63
nv_mpi 28 70
ffmpeg cpu 25 7.3

sample_4k.h264:

Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 3840x2160, 25 fps, 25 tbr, 1200k tbn, 50 tbc
AGX(fps) Nano(fps)
00_video_decode 25.7 88.10
gstreamer 132.84 87.29
nvv4l2dec 25 63
nv_mpi 23 69
ffmpeg cpu 64 19

sample_720.h264:

Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1280x544, 24.08 fps, 23.98 tbr, 1200k tbn, 47.95 tb
AXG(fps) Nano(fps)
00_video_decode 71.18 770.58
gstreamer 733.70 764.39
nvv4l2dec 47 409
nv_mpi 46 490
ffmpeg cpu 676 270

@bmegli
Copy link

bmegli commented Mar 7, 2023

@douo

Could you try adding setMaxPerfMode

	ret=ctx->dec->setMaxPerfMode(1);
	TEST_ERROR(ret < 0, "Error in setting decoder maximum performance mode", ret);

Right after lines in your code

	ret=ctx->dec->setCapturePlaneFormat(format.fmt.pix_mp.pixelformat,format.fmt.pix_mp.width,format.fmt.pix_mp.height);
	TEST_ERROR(ret < 0, "Error in setting decoder capture plane format", ret);

Context

I am using AGX Orin. Setting performance mode brings decoding roughly to GStreamer level.

dpkg-query --show nvidia-l4t-core
nvidia-l4t-core	34.1.1-20220516211757
apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 5.0.1-b118

Test

GStreamer

input=~/Downloads/iphone6s_4k.mov

gst-launch-1.0 filesrc location=$input ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 542, dropped: 0, current: 119,99, average: 119,32

your fork

./ffmpeg -y -benchmark -c:v h264_nvmpi -i ~/Downloads/iphone6s_4k.mov -f null -

frame=  549 fps= 78 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=2.65x 

your fork + change mentioned by me

./ffmpeg$ ./ffmpeg -y -benchmark -c:v h264_nvmpi -i ~/Downloads/iphone6s_4k.mov -f null -

frame=  551 fps=111 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=3.75x

@bmegli
Copy link

bmegli commented Mar 7, 2023

Here is also some inspection what GStreamer nvvideo4linux2 plugin does when setting performance mode:

I did it for encoder, but decoder ends up in same function call.

@douo
Copy link
Author

douo commented Mar 8, 2023

Great. The fps has significantly improved, but there is still a considerable gap compared to gstreamer in AGX Xavier.

4kh265: 28 -> 64
4kh264: 23 ->57
720ph264: 46 -> 290

@bmegli
Copy link

bmegli commented Mar 16, 2023

@douo

After flashing AGX Jetpack 5.1 L4T 35.2.1 (on AGX Orin 32 GB)

cat /etc/nv_tegra_release 
# R35 (release), REVISION: 2.1, GCID: 32413640, BOARD: t186ref, EABI: aarch64, DATE: Tue Jan 24 23:38:33 UTC 2023
apt-cache show nvidia-l4t-core

Package: nvidia-l4t-core
Version: 35.2.1-20230124153320

I can confirm performance loss in H.264 decoding

./ffmpeg -y -benchmark -c:v h264_nvmpi -i ~/Downloads/iphone6s_4k.mov -f null -

frame=  551 fps= 39 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=1.32x  

Citing my result with 34.1.1 (jetpack 5.0.1)

./ffmpeg -y -benchmark -c:v h264_nvmpi -i ~/Downloads/iphone6s_4k.mov -f null -

frame=  551 fps=111 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=3.75x

So dropped from 111 to 39 fps (both with max-perf-enable)

@Keylost
Copy link

Keylost commented Mar 20, 2023

U can try my repo
https://github.com/Keylost/jetson-ffmpeg
AGX Xavier JP5.1 (Power Mode: MAXN) H264 3840x2160 decoding: ~140 fps
I will probably improve it a little more in the future.
p.s. I haven't added setMaxPerfMode yet. I'm not sure if it's right to forcefully set this mode...

@bmegli
Copy link

bmegli commented Mar 21, 2023

U can try my repo
https://github.com/Keylost/jetson-ffmpeg
AGX Xavier JP5.1 (Power Mode: MAXN) H264 3840x2160 decoding: ~140 fps

Awesome!

./ffmpeg -y -benchmark -c:v h264_nvmpi -i ~/Downloads/iphone6s_4k.mov -f null -
# ...
frame=  551 fps=115 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=3.88x 

Quick test confirms performance restored on AGX Orin

  • 111 with @douo variant + max-perf-enable in 34.1.1
  • 39 with @douo variant + max-perf-enable in 5.1 / 35.2.1
  • 115 with @Keylost variant without max-perf-enable in 5.1 / 35.2.1
  • 120 with gstreamer, max-perf-enable doesn't affect result in 5.1 / 35.2.1

I haven't checked correctness of decoded video


p.s. I haven't added setMaxPerfMode yet. I'm not sure if it's right to forcefully set this mode...

I guess it's more about

  • baseline as gstreamer with nvv4l2decoder enable-max-performance=1
  • performance loss of the same code in 5.1, including Nvidia examples
  • Xavier/Orin performing below Nano level

But it looks like you pinpointed the performance loss cause.

@Keylost
Copy link

Keylost commented Mar 21, 2023

I'm glad to hear that the decoder performance restored on AGX Orin as well :)

Please note: in order to use the variant from my repository, you need to build ffmpeg with a patch from my repository.

@eusoubrasileiro
Copy link

@Keylost and @douo and @bmegli Thanks a lot! And congratulations for you all on your continuous effort to improve this repository that seams abandoned!

eusoubrasileiro added a commit to eusoubrasileiro/motion_server_nvr that referenced this pull request Jul 25, 2023
jetson-ffmpeg to ffmpeg version 6.0
Upgrade motion to version 4.5
Replace jetson-ffmpeg by @Keylost repository.
Altough, @douo and @bmegli have a good too look at.
Here jocover/jetson-ffmpeg#125 references.
@geoncoder
Copy link

This PR migrates the decoder to NvUtils. Is there another PR which also migrates the encoder? I get an error during cmake trying to find nvbuf_utils in the encoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants