Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SR-IOV Network Operator 4.15.0-202410010035 | when setting linkType: IB the NIC get filtered out #795

Open
bbenshab opened this issue Oct 23, 2024 · 2 comments

Comments

@bbenshab
Copy link

when setting: linkType: IB on a SriovNetworkNodePolicy like in this example:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlnx-port-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  ibverbs: true
  isRdma: true
  linkType: IB
  nicSelector:
    pfNames:
    - ibs3f0
    vendor: 15b3
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 1
  priority: 99
  rdma: true
  resourceName: port1

the NIC gets filtered out as shown below openshift.io/port1= 0

oc get node intel-perf-27.perf.eng.bos2.dc.redhat.com -o json | jq .status.allocatable
{
  "cpu": "127500m",
  "ephemeral-storage": "213881594729",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "526694060Ki",
  "nvidia.com/gpu": "2",
  "openshift.io/port1": "0",
  "pods": "250",
  "rdma/rdma_shared_device_a": "63"
}

the only workaround I found is to edit the config map:
oc edit configmap -n openshift-sriov-network-operator device-plugin-config

and then removing:
"linkTypes":["Infiniband"],

from:

apiVersion: v1
data:
  intel-perf-27.perf.eng.bos2.dc.redhat.com: '{"resourceList":[{"resourceName":"port1","selectors":{"vendors":["15b3"],"pfNames":["ibs3f0"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null},{"resourceName":"port2","selectors":{"vendors":["15b3"],"pfNames":["ibs3f1"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null}]}'
  perf-intel-6.perf.eng.bos2.dc.redhat.com: '{"resourceList":[{"resourceName":"port1","selectors":{"vendors":["15b3"],"pfNames":["ibs3f0"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null},{"resourceName":"port2","selectors":{"vendors":["15b3"],"pfNames":["ibs3f1"],"linkTypes":["infiniband"],"IsRdma":true,"NeedVhostNet":false},"SelectorObj":null}]}'
kind: ConfigMap

however it get resets every 300 seconds.

for reference:
NetworkAttachmentDefinition:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
k8s.v1.cni.cncf.io/resourceName: openshift.io/port1
name: network-port-1
namespace: default
spec:
config: "{\n "cniVersion": "0.3.1",\n "name": "network-port-1",\n
\ "type": "ib-sriov",\n "logLevel": "info",\n "ipam": {\n
\ "type": "whereabouts",\n "range": "192.168.1.2/24",\n
\ "exclude": [\n "192.168.1.1",\n "192.168.1.2"
,\n "192.168.1.254",\n "192.168.1.255"\n ],\n
\ "routes": [\n {\n "dst": "192.168.1.0/24"
\n }\n ]\n }\n}"


SriovIBNetwork:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: sriov-ib-network-port-1
namespace: openshift-sriov-network-operator
spec:
pfNames:

  • ibs3f0
    rdma: true
    resourceName: port1
@zeeke
Copy link
Member

zeeke commented Oct 23, 2024

hi @bbenshab. can you please attach sriov logs and resources? Since it is an openshift cluster, you can get them with:

oc adm must-gather -- /usr/bin/gather_sriov

Also, you stated this happens on 4.15.0, any chance you can reproduce this issue with the latest sriov-network-operator version?

@adrianchiris
Copy link
Collaborator

lets try to reproduce with sriov-network-operator from this repo.

also it may be related to : #797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants