Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uavs3e crashes consistently when compiled with 10 bit support on amd64 Linux with AVX2 #53

Open
olokelo opened this issue Aug 5, 2024 · 0 comments

Comments

@olokelo
Copy link

olokelo commented Aug 5, 2024

Hello,

When I'm compiling uavs3e with 10 bit support it crashes right after encoding starts. This happens for 8 and 10 bit inputs. I can see the issue is within the AVX2 sources.
I compiled uavs3e with clang 18 on amd64 Arch Linux. Under gcc 14 it doesn't compile at all, also related to AVX2 but that's a separate issue.
Then compiling without 10 bit support it works fine.

Here's how I have uavs3e built:

$ cmake -DCOMPILE_10BIT=1 -DCMAKE_CXX_FLAGS="-g3" -DCMAKE_C_FLAGS="-g3" ../..
-- The C compiler identification is Clang 18.1.8
-- The CXX compiler identification is Clang 18.1.8
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
  
                      GIT VERSION TOOLS
                    =====================
  
  get the code version number of remote & local repository.
  
       remote: 212
        local: 2
        SHA-1: e1ff0f37a8d67814e1d650f6b49d495c49dde946
  
  remote version 2 is added to file version.h, such as:
  
  #define VER_BUILD    2
  
  #define VERSION_SHA1 e1ff0f37a8d67814e1d650f6b49d495c49dde946
  
-- uavs3e version               : 1.3.2
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- INSTALL_INCLUDE_DIR          : /usr/local/include
-- INSTALL_LIB_DIR              : /usr/local/lib
-- INSTALL_PKGCONFIG_DIR        : /usr/local/lib/pkgconfig
-- BUILD_SHARED_LIBS            : false
-- compile 10bit
-- Configuring done (0.5s)
-- Generating done (0.0s)
-- Build files have been written to: /home/oloke/Downloads/Sources/uavs3e/build/linux

Here's command and gdb output:

$ gdb --args ~/Downloads/Sources/uavs3e/build/linux/uavs3enc -i matrix.yuv -w 1920 -h 1080 -d 8 --fps_num 24000 --fps_den 1001 --internal_bit_depth 8 --speed_level 2 --rc_type 1 -q 40 -p 128 -o matrix.avs3
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/oloke/Downloads/Sources/uavs3e/build/linux/uavs3enc...
(gdb) r
Starting program: /home/oloke/Downloads/Sources/uavs3e/build/linux/uavs3enc -i matrix.yuv -w 1920 -h 1080 -d 8 --fps_num 24000 --fps_den 1001 --internal_bit_depth 8 --speed_level 2 --rc_type 1 -q 40 -p 128 -o matrix.avs3

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) 
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Version: 1.3.2_release,  SHA-1: e1ff0f37a8d67814e1d650f6b49d495c49dde946
[New Thread 0x7ffff4e006c0 (LWP 126884)]
-----------------------------------------------------------------------------------------------------------------------------------
< Sequence's Info >
        resolution input         : 1920 x 1080
        resolution coding        : 1920 x 1080
        bitdepth input           : 8
        bitdepth coding          : 8
        frame rate               : 24000 / 1001
        intra picture period     : 128
        close_gop                : 0
        max b frames             : 15
        signature                : 0

< LookAhead Info >
        lookahead                : 40
        scenecut                 : 0
        schistogram              : 0
        adaptive_gop             : 0

< Parallel Info >
        WPP threads              : 1    (1-9)
        frame threads            : 1

< RC Info >
        RC type                  : 1 (0: CQP, 1: CRF, 2: ABR, 3: CBR)
        crf                      : 34
        max_bitrate              : 0
        qp range                 : 0-63
        qp_offset_cb             : 0
        qp_offset_cr             : 0

< CU split CFG >
        ctu_size:        128
        min_cu_size:     4
        max_part_ratio:  8
        max_split_times: 6
        min_qt_size:     8
        max_bt_size:     64
        max_eqt_size:    32
        max_dt_size:     64

< Tool CFG >
        Loop Filter:  deblock: 1, sao: 1, alf: 1, cross_patch: 1, 
        Inter: AMVR(1) HMVP_NUM(8) AFFINE(1) SMVD(1) UMVE(1) EMVR(1) 
        Intra: TSCPM(1) IPF(0) DT(1) 
        Transform: PBT(1) SECTrans(1) 
        Quant: WeightedQuant: 0 
        ENC-Side Tools: chroma_qp(1) AQ(0) 
        Speed_level: 2   (allowed 0, 1, 2, 3, 4)
-----------------------------------------------------------------------------------------------------------------------------------
  Input YUV file           : matrix.yuv 
  Output bitstream         : matrix.avs3 
-----------------------------------------------------------------------------------------------------------------------------------
    POC | QP |  PSNR-Y  PSNR-U  PSNR-V| SSIM-Y SSIM-U SSIM-V|   Bits |  Time |        Ref. List      | Ext_info
[New Thread 0x7fffea2006c0 (LWP 126885)]
***************************************************************************************************************************************
    0(I)|30.0| 45.6035 43.8366 43.2977| 0.9826 0.9652 0.9762|  344440|   9971|L0         |L1         |[layer:0 cost: 3.10 brTal:  8258.30kbps] 
***************
Thread 2 "uavs3enc" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4e006c0 (LWP 126884)]
uavs3e_if_hor_ver_chroma_w4_avx2 (src=0x55555a44cb44, i_src=<optimized out>, dst=0x7fffe3d36580, i_dst=8, 
    width=<optimized out>, height=4, coef_x=0x555555671244 <com_tbl_mc_c_coeff_hp+116> "\377\a<\376", 
    coef_y=0x5555556711d8 <com_tbl_mc_c_coeff_hp+8> "\376>\004", max_val=255)
    at /home/oloke/Downloads/Sources/uavs3e/src/avx2/inter_pred_avx2.c:4753
4753            _mm_store_si128((__m128i *)(tmp), m0);
(gdb) where
#0  uavs3e_if_hor_ver_chroma_w4_avx2 (src=0x55555a44cb44, i_src=<optimized out>, dst=0x7fffe3d36580, i_dst=8, 
    width=<optimized out>, height=4, coef_x=0x555555671244 <com_tbl_mc_c_coeff_hp+116> "\377\a<\376", 
    coef_y=0x5555556711d8 <com_tbl_mc_c_coeff_hp+8> "\376>\004", max_val=255)
    at /home/oloke/Downloads/Sources/uavs3e/src/avx2/inter_pred_avx2.c:4753
#1  0x000055555560050c in com_mc_blk_chroma (uv_flag=0, dst=<optimized out>, dst_stride=<optimized out>, x_pos=27, 
    y_pos=70, width=4, height=4, max_posx=<optimized out>, max_posy=<optimized out>, max_val=<optimized out>, hp_flag=1, 
    pic=<optimized out>, widx=<optimized out>) at /home/oloke/Downloads/Sources/uavs3e/src/com_mc.c:136
#2  com_mc_blk_affine (x=<optimized out>, y=<optimized out>, pic_w=<optimized out>, pic_h=<optimized out>, 
    cu_width=<optimized out>, cu_width@entry=16, cu_height=<optimized out>, cu_height@entry=16, ac_mv=<optimized out>, 
    ref_pic=<optimized out>, pred=<optimized out>, cp_num=<optimized out>, sub_w=8, sub_h=8, lidx=<optimized out>, 
    bit_depth=<optimized out>) at /home/oloke/Downloads/Sources/uavs3e/src/com_mc.c:351
#3  0x00005555555ffa00 in com_mc_cu_affine (x=<optimized out>, y=<optimized out>, pic_w=<optimized out>, 
    pic_h=<optimized out>, w=16, h=16, refi=0x7fffe3cfe4b4 "", mv=0x7fffe3cfe4c8, refp=0x5555556b8e08, 
    pred_buf=0x7fffe3d2e580, vertex_num=3, sh=0x5555556b8c70, bit_depth=8)
    at /home/oloke/Downloads/Sources/uavs3e/src/com_mc.c:381
#4  0x000055555562cfb3 in inter_rdcost (core=<optimized out>, core@entry=0x7fffe3800020, lbac_best_ret=<optimized out>, 
    lbac_best_ret@entry=0x7ffff4dfdfc0, bForceAllZero=<optimized out>, bForceAllZero@entry=0, need_mc=<optimized out>, 
    need_mc@entry=1, dist_input=<optimized out>, dist_input@entry=0x7ffff4df4f00, dist_pred_input=<optimized out>, 
    dist_pred_input@entry=0x7ffff4df4e90) at /home/oloke/Downloads/Sources/uavs3e/src/inter.c:264
#5  0x000055555562a60c in analyze_affine_merge (lbac_best=0x7ffff4dfdfc0, core=<optimized out>)
    at /home/oloke/Downloads/Sources/uavs3e/src/inter.c:827
#6  analyze_inter_cu (core=core@entry=0x7fffe3800020, lbac_best=lbac_best@entry=0x7ffff4dfdfc0)
    at /home/oloke/Downloads/Sources/uavs3e/src/inter.c:1983
#7  0x00005555555b9dc1 in mode_coding_unit (core=0x7fffe3800020, lbac_best=0x7ffff4dfdfc0, x=<optimized out>, 
    y=<optimized out>, cu_width_log2=4, cu_height_log2=4, cud=<optimized out>, cu_data=0x7fffe3800f40, texture_dir=0)
    at /home/oloke/Downloads/Sources/uavs3e/src/analyze.c:915
#8  0x00005555555b60be in mode_coding_tree (core=<optimized out>, core@entry=0x7fffe3800020, lbac_cur=<optimized out>, 
    x0=<optimized out>, x0@entry=48, y0=<optimized out>, y0@entry=144, cup=<optimized out>, cu_width_log2=<optimized out>, 
    cu_width_log2@entry=4, cu_height_log2=<optimized out>, cud=<optimized out>, parent_split=<optimized out>, 
    qt_depth=<optimized out>, bet_depth=<optimized out>, cons_pred_mode=<optimized out>, tree_status=<optimized out>)
    at /home/oloke/Downloads/Sources/uavs3e/src/analyze.c:1490
#9  0x00005555555b7027 in mode_coding_tree (core=<optimized out>, core@entry=0x7fffe3800020, lbac_cur=<optimized out>, 
    x0=<optimized out>, x0@entry=32, y0=<optimized out>, y0@entry=128, cup=<optimized out>, cu_width_log2=<optimized out>, 
    cu_width_log2@entry=5, cu_height_log2=<optimized out>, cud=<optimized out>, parent_split=<optimized out>, 
    qt_depth=<optimized out>, bet_depth=<optimized out>, cons_pred_mode=<optimized out>, tree_status=<optimized out>)
    at /home/oloke/Downloads/Sources/uavs3e/src/analyze.c:1648
#10 0x00005555555b7027 in mode_coding_tree (core=<optimized out>, core@entry=0x7fffe3800020, lbac_cur=<optimized out>, 
    x0=<optimized out>, x0@entry=0, y0=<optimized out>, y0@entry=128, cup=<optimized out>, cu_width_log2=<optimized out>, 
    cu_width_log2@entry=6, cu_height_log2=<optimized out>, cud=<optimized out>, parent_split=<optimized out>, 
    qt_depth=<optimized out>, bet_depth=<optimized out>, cons_pred_mode=<optimized out>, tree_status=<optimized out>)
    at /home/oloke/Downloads/Sources/uavs3e/src/analyze.c:1648
#11 0x00005555555b7027 in mode_coding_tree (core=<optimized out>, core@entry=0x7fffe3800020, lbac_cur=<optimized out>, 
    lbac_cur@entry=0x7ffff4dff8c8, x0=<optimized out>, y0=<optimized out>, cup=<optimized out>, cup@entry=0, 
    cu_width_log2=<optimized out>, cu_width_log2@entry=7, cu_height_log2=<optimized out>, cud=<optimized out>, 
    parent_split=<optimized out>, qt_depth=<optimized out>, bet_depth=<optimized out>, cons_pred_mode=<optimized out>, 
    tree_status=<optimized out>) at /home/oloke/Downloads/Sources/uavs3e/src/analyze.c:1648
#12 0x00005555555b4a06 in enc_mode_analyze_lcu (core=core@entry=0x7fffe3800020, lbac=<optimized out>, 
    lbac@entry=0x7fffeb34bc9c) at /home/oloke/Downloads/Sources/uavs3e/src/analyze.c:1746
#13 0x0000555555561ce8 in enc_lcu_row (core=0x7fffe3800020, row=0x7fffeb34bc98)
    at /home/oloke/Downloads/Sources/uavs3e/src/uavs3e.c:855
#14 0x000055555556356d in enc_pic_thread (ep=0x7fffea800020, p=0x5555556b8ae0)
    at /home/oloke/Downloads/Sources/uavs3e/src/uavs3e.c:1044
#15 0x0000555555610554 in uavs3e_threadpool_thread (pool=0x5555556dd1e0)
    at /home/oloke/Downloads/Sources/uavs3e/src/com_thread.c:193
#16 0x00007ffff7d3639d in ?? () from /usr/lib/libc.so.6
#17 0x00007ffff7dbb49c in ?? () from /usr/lib/libc.so.6
(gdb) 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant