Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Device Allocation Across NUMA Nodes Despite topologyManagerPolicy Set to restricted #29

Open
cocotyty opened this issue Jan 22, 2025 · 0 comments

Comments

@cocotyty
Copy link

We are encountering an issue where the Habana Kubernetes Device Plugin does not respect the NUMA topology when allocating devices, even though the topologyManagerPolicy for Kubelet is set to restricted. This results in devices being allocated across different NUMA nodes, which negatively impacts performance and violates the expected behavior of the topology manager.

Steps to Reproduce:

  1. Set the Kubelet topologyManagerPolicy to restricted.
  2. Deploy a Pod requesting four Habana Gaudi devices.
  3. Observe the allocated devices using hl-smi and check their NUMA node affinity.

Expected Behavior:

The Habana Device Plugin should allocate all requested devices from the same NUMA node, as specified by the restricted topology manager policy.

Actual Behavior:

The devices are allocated randomly across multiple NUMA nodes, as evidenced by the HABANA_VISIBLE_DEVICES environment variable and hl-smi output.

Logs and Evidence:

1. Device Plugin Registration Logs:

The logs confirm that the devices are correctly registered with their NUMA topology:

{"time":"2025-01-22T07:20:18.549489477Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":0}
{"time":"2025-01-22T07:20:18.549508177Z","level":"INFO","msg":"Device found","service":"habana-device-plugin","device":"GAUDI","serial":"AN34021114","uuid":"01P0-HL2080A0-15-TK6V51-09-08-03","id":"0","pci_bus_id":"0000:bb:00.0"}
{"time":"2025-01-22T07:20:18.549527415Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":1}
{"time":"2025-01-22T07:20:18.549543432Z","level":"INFO","msg":"Device found","service":"habana-device-plugin","device":"GAUDI","serial":"AN34020893","uuid":"01P0-HL2080A0-15-TK6V51-20-01-05","id":"0","pci_bus_id":"0000:9a:00.0"}
{"time":"2025-01-22T07:20:18.549560288Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":1}
{"time":"2025-01-22T07:20:18.549574602Z","level":"INFO","msg":"Device found","service":"habana-device-plugin","device":"GAUDI","serial":"AN34020866","uuid":"01P0-HL2080A0-15-TK6V56-15-01-05","id":"0","pci_bus_id":"0000:5d:00.0"}
{"time":"2025-01-22T07:20:18.549601408Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":0}
{"time":"2025-01-22T07:20:18.549616894Z","level":"INFO","msg":"Device found","service":"habana-device-plugin","device":"GAUDI","serial":"AN34021116","uuid":"01P0-HL2080A0-15-TK6V53-21-04-02","id":"0","pci_bus_id":"0000:ca:00.0"}
{"time":"2025-01-22T07:20:18.549633699Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":1}
{"time":"2025-01-22T07:20:18.549649198Z","level":"INFO","msg":"Device found","service":"habana-device-plugin","device":"GAUDI","serial":"AN34020927","uuid":"01P0-HL2080A0-15-TK6V49-12-04-06","id":"0","pci_bus_id":"0000:4b:00.0"}
{"time":"2025-01-22T07:20:18.549668234Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":0}
{"time":"2025-01-22T07:20:18.549681995Z","level":"INFO","msg":"Device found","service":"habana-device-plugin","device":"GAUDI","serial":"AN34021108","uuid":"01P0-HL2080A0-15-TK6V56-10-06-00","id":"0","pci_bus_id":"0000:db:00.0"}
{"time":"2025-01-22T07:20:18.54969921Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":1}
{"time":"2025-01-22T07:20:18.549715112Z","level":"INFO","msg":"Device found","service":"habana-device-plugin","device":"GAUDI","serial":"AN34020915","uuid":"01P0-HL2080A0-15-TK6V52-15-02-02","id":"0","pci_bus_id":"0000:18:00.0"}
{"time":"2025-01-22T07:20:18.549732578Z","level":"INFO","msg":"Device cpu affinity","service":"habana-device-plugin","id":"0","cpu_affinity":0}

2. Allocation Logs:

The logs show that the Device Plugin processes the allocation request:

{"time":"2025-01-22T07:25:51.465964458Z","level":"INFO","msg":"Preparing device for registration","service":"habana-device-plugin","device":{"ID":"AN34020915","health":"Healthy","topology":{"nodes":[{}]}}}
{"time":"2025-01-22T07:25:51.46608777Z","level":"INFO","msg":"Getting device handle from hlml","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.466123889Z","level":"INFO","msg":"Getting device minor number","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.466129542Z","level":"INFO","msg":"Getting device module id","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.466291215Z","level":"INFO","msg":"Preparing device for registration","service":"habana-device-plugin","device":{"ID":"AN34020872","health":"Healthy","topology":{"nodes":[{}]}}}
{"time":"2025-01-22T07:25:51.466305474Z","level":"INFO","msg":"Getting device handle from hlml","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.466308218Z","level":"INFO","msg":"Getting device minor number","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.466310175Z","level":"INFO","msg":"Getting device module id","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.4663575Z","level":"INFO","msg":"Preparing device for registration","service":"habana-device-plugin","device":{"ID":"AN34020866","health":"Healthy","topology":{"nodes":[{}]}}}
{"time":"2025-01-22T07:25:51.466361044Z","level":"INFO","msg":"Getting device handle from hlml","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.46636571Z","level":"INFO","msg":"Getting device minor number","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.466367695Z","level":"INFO","msg":"Getting device module id","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.466406187Z","level":"INFO","msg":"Preparing device for registration","service":"habana-device-plugin","device":{"ID":"AN34020927","health":"Healthy","topology":{"nodes":[{}]}}}
{"time":"2025-01-22T07:25:51.466410268Z","level":"INFO","msg":"Getting device handle from hlml","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.46641678Z","level":"INFO","msg":"Getting device minor number","service":"habana-device-plugin"}
{"time":"2025-01-22T07:25:51.46641866Z","level":"INFO","msg":"Getting device module id","service":"habana-device-plugin"}

Notice: At this time, AN34020915, AN34020872, AN34020866, and AN34020927 have been correctly processed. These cards are all on NUMA 0.

3. hl-smi Output:

sudo hl-smi -Q index,module_id,name,bus_id,serial -f csv
index, module_id, name, bus_id, serial
0, 6, HL-225D, 0000:18:00.0, AN34020915
1, 2, HL-225D, 0000:9a:00.0, AN34020893
2, 7, HL-225D, 0000:3c:00.0, AN34020872
3, 3, HL-225D, 0000:bb:00.0, AN34021114
4, 5, HL-225D, 0000:5d:00.0, AN34020866
5, 4, HL-225D, 0000:4b:00.0, AN34020927
6, 0, HL-225D, 0000:ca:00.0, AN34021116
7, 1, HL-225D, 0000:db:00.0, AN34021108

AN34020915: index 0 module_id 6
AN34020872: index 2 module_id 7
AN34020866: index 4 module_id 5
AN34020927: index 5 module_id 4 bus_id: 0000:5d:00.0

4. Container Environment Variables and hl-smi Output:

The HABANA_VISIBLE_DEVICES environment variable in the container shows incorrect device allocation:

sudo ctr -n k8s.io c info 5dd335cec1fc1436539facc3fd3d6efb1203dc4ad234bc1c05341554c9c9a02e | grep HABANA
WARN[0000] DEPRECATION: The `mirrors` property of `[plugins."io.containerd.grpc.v1.cri".registry]` is deprecated since containerd v1.5 and will be removed in containerd v2.0. Use `config_path` instead.
WARN[0000] DEPRECATION: The `configs` property of `[plugins."io.containerd.grpc.v1.cri".registry]` is deprecated since containerd v1.5 and will be removed in containerd v2.0. Use `config_path` instead.
                "HABANA_LOGS=/var/log/habana_logs/",
                "HABANA_SCAL_BIN_PATH=/opt/habanalabs/engines_fw",
                "HABANA_PLUGINS_LIB_PATH=/opt/habanalabs/habana_plugins",
                "HABANA_VISIBLE_DEVICES=7,0,3,5",
                "HABANA_VISIBLE_MODULES=6,7,5,4",
 sudo hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version:                              hl-1.19.0-fw-57.1.0.0          |
| Driver Version:                                     1.19.0-2427ed8          |
|-------------------------------+----------------------+----------------------+
| AIP  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncor-Events|
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | AIP-Util  Compute M. |
|===============================+======================+======================|
|   0  HL-225D             N/A  | 0000:18:00.0     N/A |                   0  |
| N/A   31C   N/A  90W /  450W  | 98304MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   1  HL-225D             N/A  | 0000:9a:00.0     N/A |                   0  |
| N/A   33C   N/A  93W /  450W  |   768MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   2  HL-225D             N/A  | 0000:3c:00.0     N/A |                   0  |
| N/A   33C   N/A  84W /  450W  |   768MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   3  HL-225D             N/A  | 0000:bb:00.0     N/A |                   0  |
| N/A   33C   N/A  90W /  450W  | 98304MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   4  HL-225D             N/A  | 0000:5d:00.0     N/A |                   0  |
| N/A   33C   N/A  77W /  450W  |   768MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   5  HL-225D             N/A  | 0000:4b:00.0     N/A |                   0  |
| N/A   33C   N/A  81W /  450W  | 98304MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   6  HL-225D             N/A  | 0000:ca:00.0     N/A |                   0  |
| N/A   32C   N/A  78W /  450W  | 89662MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   7  HL-225D             N/A  | 0000:db:00.0     N/A |                   0  |
| N/A   32C   N/A  74W /  450W  | 98304MiB /  98304MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes:                                               AIP Memory |
|  AIP       PID   Type   Process name                             Usage      |
|=============================================================================|
|   0       2051433     C   python3                                 97536MiB
|   1        N/A   N/A    N/A                                      N/A        |
|   2        N/A   N/A    N/A                                      N/A        |
|   3       2051431     C   python3                                 97536MiB
|   4        N/A   N/A    N/A                                      N/A        |
|   5       2051432     C   python3                                 97536MiB
|   6       1870842     C   python3                                 88894MiB
|   7       2051430     C   python3                                 97536MiB
+=============================================================================+

The card with Bus-Id=0000:5d:00.0 should be used, but not!

Root Cause Analysis:

After reviewing the Device Plugin source code, it appears that the issue stems from the use of minor numbers to populate the HABANA_VISIBLE_DEVICES environment variable. Instead, the index should be used to ensure NUMA-aware allocation.

Relevant Code Snippet:

path := fmt.Sprintf("/dev/accel/accel%d", minor)
paths = append(paths, path)
uuids = append(uuids, id)
netConfig = append(netConfig, fmt.Sprintf("%d", minor))
visibleModule = append(visibleModule, fmt.Sprintf("%d", moduleID))

ds := &pluginapi.DeviceSpec{
    ContainerPath: path,
    HostPath:      path,
    Permissions:   "rw",
}
devicesList = append(devicesList, ds)
path = fmt.Sprintf("/dev/accel/accel_controlD%d", minor)

ds = &pluginapi.DeviceSpec{
    ContainerPath: path,
    HostPath:      path,
    Permissions:   "rw",
}
devicesList = append(devicesList, ds)
}

envMap := map[string]string{
    "HABANA_VISIBLE_DEVICES":  strings.Join(netConfig, ","),
    "HL_VISIBLE_DEVICES":      strings.Join(paths, ","),
    "HL_VISIBLE_DEVICES_UUID": strings.Join(uuids, ","),
}

Suggested Fix:

  1. Modify the Device Plugin to use index instead of minor when setting the HABANA_VISIBLE_DEVICES environment variable.

Additional Context:

  • Kubernetes Version: v1.29.7
  • Habana Device Plugin Version: vault.habana.ai/docker-k8s-device-plugin/docker-k8s-device-plugin@sha256:d833ab42152c7d58f7d56b4f95e67f645b409990bfc5d403b178d19d0d857e3f

Impact:

This issue significantly impacts performance-sensitive workloads that rely on NUMA locality. It also undermines the effectiveness of Kubernetes' topology management features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant