Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG ConnectX-4 Lx网卡在开机或重启时,有概率会掉网卡 #3556

Open
gasment opened this issue Sep 30, 2024 · 0 comments
Open

BUG ConnectX-4 Lx网卡在开机或重启时,有概率会掉网卡 #3556

gasment opened this issue Sep 30, 2024 · 0 comments

Comments

@gasment
Copy link

gasment commented Sep 30, 2024

请填写以下信息.
Please fill in the following information.

Install ENV: (You can find it in the boot interface.)

  • DMI: qemu
  • CPU:
  • NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]

RR version: (You can find it in the update menu.)

  • RR: 24.9.1
  • addons:
  • modules:
  • lkms:

DSM:

  • model: DS920+
  • version: 7.2.2

Issue:
ConnectX-4 Lx网卡在开机或重启时,有概率会初始化失败,无法创建eth导致失联,其他型号试过sa6400也一样
在RR阶段是无问题的,进度到dsm内核才会出现概率掉卡
logs:

SynologyNAS> [ 126.411146] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 126.414546] mlx5_core 0000:00:11.0: 0000:00:11.0:page_notify_fail:308:(pid 3851): page notify failed
[ 126.415060] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 4999): ALLOC_UAR(0x802) timeout. Will cause a leak of a command resource
[ 126.415062] mlx5_core 0000:00:11.0: 0000:00:11.0:mlx5_alloc_map_uar:237:(pid 4999): mlx5_cmd_alloc_uar() failed, -110
[ 126.415064] mlx5_core 0000:00:11.0: 0000:00:11.0:mlx5e_create_netdev:2141:(pid 4999): alloc_map uar failed, -110
[ 126.415203] udevd[4999]: failed to send result of seq 1835 to main daemon: Connection refused
[ 126.426965] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): give fail -110
^C
SynologyNAS> [ 186.428088] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 186.431379] mlx5_core 0000:00:11.0: 0000:00:11.0:reclaim_pages:407:(pid 3851): failed reclaiming pages
[ 186.433866] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): reclaim fail -110
[ 246.436094] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 246.439622] mlx5_core 0000:00:11.0: 0000:00:11.0:reclaim_pages:407:(pid 3851): failed reclaiming pages
[ 246.442351] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): reclaim fail -110
[ 306.445143] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 306.447718] mlx5_core 0000:00:11.0: 0000:00:11.0:reclaim_pages:407:(pid 3851): failed reclaiming pages
[ 306.449502] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): reclaim fail -110

00:11.0 Class 0200: Device 15b3:1015
Subsystem: Device 15b3:0069
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at 7030000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at c1600000 [disabled] [size=1M]
Capabilities: [60] Express Endpoint, IntMsgNum 0
Capabilities: [48] Vital Product Data
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Kernel driver in use: mlx5_core

(## 因为 log中存在 SN/MAC 等一些敏感信息, 当提供完整文件时请自行抹除他们, 当然你也可以发送到我的邮箱. ##)
(## Because the log contains some sensitive information such as SN/MAC, please delete them when providing the complete file. Of course, you can also send it to my email. ##)
...

(请先看一下#173#175、#226的内容)
(Plz review the content of #173, #175, #226 first)
...

(如果你只是说 XXX 不能用, 什么详细信息也不提供, 我也只能说感谢你的反馈.)
(If you just say XXX doesn't work without providing any details, I can only say thank you for your feedback.)
...

@gasment gasment changed the title BUG BUG ConnectX-4 Lx网卡在开机或重启时,有概率会掉网卡 Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant