Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode output of nvmlDeviceGetName to avoid JSON serialize issue #240

Merged
merged 3 commits into from
Aug 15, 2024

Conversation

KeitaW
Copy link
Contributor

@KeitaW KeitaW commented Aug 3, 2024

nvmlDeviceGetName returns bytes, which is not JSON serializable. This causes

TypeError: Object of type bytes is not JSON serializable

error when hub_utils.py calls save_json method (

def save_json(self, path: str, flat: bool = False) -> None:
) as it contains the output from the function (
gpus.append(pynvml.nvmlDeviceGetName(handle))
).
ex (see 'gpu' key value).

bound method PushToHubMixin.to_dict of BenchmarkConfig(name='pytorch_bert', backend=PyTorchConfig(name='pytorch', version='2.4.0a0+3bcc3cddb5.nv24.7', _target_='optimum_benchmark.backends.pytorch.backend.PyTorchBackend', task='fill-mask', library='transformers', model_type='bert', model='bert-base-uncased', processor='bert-base-uncased', device='cuda', device_ids='0', seed=42, inter_op_num_threads=None, intra_op_num_threads=None, model_kwargs={}, processor_kwargs={}, no_weights=True, device_map=None, torch_dtype=None, eval_mode=True, to_bettertransformer=False, low_cpu_mem_usage=None, attn_implementation=None, cache_implementation=None, autocast_enabled=False, autocast_dtype=None, torch_compile=False, torch_compile_target='forward', torch_compile_config={}, quantization_scheme=None, quantization_config={}, deepspeed_inference=False, deepspeed_inference_config={}, peft_type=None, peft_config={}), scenario=InferenceConfig(name='inference', _target_='optimum_benchmark.scenarios.inference.scenario.InferenceScenario', iterations=10, duration=10, warmup_runs=10, input_shapes={'batch_size': 1, 'num_choices': 2, 'sequence_length': 128}, new_tokens=None, memory=True, latency=True, energy=False, forward_kwargs={}, generate_kwargs={}, call_kwargs={}), launcher=ProcessConfig(name='process', _target_='optimum_benchmark.launchers.process.launcher.ProcessLauncher', device_isolation=True, device_isolation_action='warn', numactl=False, numactl_kwargs={}, start_method='spawn'), environment={'cpu': ' AMD EPYC 7R13 Processor', 'cpu_count': 192, 'cpu_ram_mb': 781912.027136, 'system': 'Linux', 'machine': 'x86_64', 'platform': 'Linux-5.15.0-1063-aws-x86_64-with-glibc2.35', 'processor': 'x86_64', 'python_version': '3.10.12', 'gpu': [b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4'], 'gpu_count': 8, 'gpu_vram_mb': 193223196672, 'optimum_benchmark_version': '0.4.0', 'optimum_benchmark_commit': None, 'transformers_version': '4.43.3', 'transformers_commit': None, 'accelerate_version': '0.33.0', 'accelerate_commit': None, 'diffusers_version': None, 'diffusers_commit': None, 'optimum_version': None, 'optimum_commit': None, 'timm_version': None, 'timm_commit': None, 'peft_version': None, 'peft_commit': None})>

I believe that the issue can be avoided simply by decoding the bytes into string as shown in the following example.

from json import dump, load
import pynvml

# Proposed version
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)
    gpus.append(pynvml.nvmlDeviceGetName(handle).decode("utf-8"))
pynvml.nvmlShutdown()
data = {'gpu': gpus}

with open("/tmp/tmp.json", "w") as f:
    dump(data, f)
with open("/tmp/tmp.json", "r") as f: 
    print(f.read())
# {"gpu": ["NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4"]}
#Current version
# https://github.com/huggingface/optimum-benchmark/blob/65fa416fd503cfe9a2be7637ee30c70a4a1f96f1/optimum_benchmark/system_utils.py#L95
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)
    gpus.append(pynvml.nvmlDeviceGetName(handle))
pynvml.nvmlShutdown()
data = {'gpu': gpus}

with open("/tmp/tmp.json", "w") as f:
    dump(data, f)

#Traceback (most recent call last):
#  File "/workspace/debug/debug-pynvml.py", line 14, in <module>
#    dump(data, f)
#  File "/usr/lib/python3.10/json/__init__.py", line 179, in dump
#    for chunk in iterable:
#  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
#    yield from _iterencode_dict(o, _current_indent_level)
#  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
#    yield from chunks
#  File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list
#    yield from chunks
#  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
#    o = _default(o)
#  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
#    raise TypeError(f'Object of type {o.__class__.__name__} '
#TypeError: Object of type bytes is not JSON serializable
#make: *** [Makefile:25: debug-pynvml] Error 1

`nvmlDeviceGetName` returns bytes, which is not JSON serializable.  This causes 
```
TypeError: Object of type bytes is not JSON serializable
```
error when `hub_utils.py` calls [save_json method](https://github.com/huggingface/optimum-benchmark/blob/65fa416fd503cfe9a2be7637ee30c70a4a1f96f1/optimum_benchmark/hub_utils.py#L45) as it contains the output from this function.
ex. 
```
bound method PushToHubMixin.to_dict of BenchmarkConfig(name='pytorch_bert', backend=PyTorchConfig(name='pytorch', version='2.4.0a0+3bcc3cddb5.nv24.7', _target_='optimum_benchmark.backends.pytorch.backend.PyTorchBackend', task='fill-mask', library='transformers', model_type='bert', model='bert-base-uncased', processor='bert-base-uncased', device='cuda', device_ids='0', seed=42, inter_op_num_threads=None, intra_op_num_threads=None, model_kwargs={}, processor_kwargs={}, no_weights=True, device_map=None, torch_dtype=None, eval_mode=True, to_bettertransformer=False, low_cpu_mem_usage=None, attn_implementation=None, cache_implementation=None, autocast_enabled=False, autocast_dtype=None, torch_compile=False, torch_compile_target='forward', torch_compile_config={}, quantization_scheme=None, quantization_config={}, deepspeed_inference=False, deepspeed_inference_config={}, peft_type=None, peft_config={}), scenario=InferenceConfig(name='inference', _target_='optimum_benchmark.scenarios.inference.scenario.InferenceScenario', iterations=10, duration=10, warmup_runs=10, input_shapes={'batch_size': 1, 'num_choices': 2, 'sequence_length': 128}, new_tokens=None, memory=True, latency=True, energy=False, forward_kwargs={}, generate_kwargs={}, call_kwargs={}), launcher=ProcessConfig(name='process', _target_='optimum_benchmark.launchers.process.launcher.ProcessLauncher', device_isolation=True, device_isolation_action='warn', numactl=False, numactl_kwargs={}, start_method='spawn'), environment={'cpu': ' AMD EPYC 7R13 Processor', 'cpu_count': 192, 'cpu_ram_mb': 781912.027136, 'system': 'Linux', 'machine': 'x86_64', 'platform': 'Linux-5.15.0-1063-aws-x86_64-with-glibc2.35', 'processor': 'x86_64', 'python_version': '3.10.12', 'gpu': [b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4'], 'gpu_count': 8, 'gpu_vram_mb': 193223196672, 'optimum_benchmark_version': '0.4.0', 'optimum_benchmark_commit': None, 'transformers_version': '4.43.3', 'transformers_commit': None, 'accelerate_version': '0.33.0', 'accelerate_commit': None, 'diffusers_version': None, 'diffusers_commit': None, 'optimum_version': None, 'optimum_commit': None, 'timm_version': None, 'timm_commit': None, 'peft_version': None, 'peft_commit': None})>
```
I believe that the issue can be avoided simply by decoding the bytes into string as shown in the following example.

```python
from json import dump, load
import pynvml

# Proposed version
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)
    gpus.append(pynvml.nvmlDeviceGetName(handle).decode("utf-8"))
pynvml.nvmlShutdown()
data = {'gpu': gpus}

with open("/tmp/tmp.json", "w") as f:
    dump(data, f)
with open("/tmp/tmp.json", "r") as f: 
    print(f.read())
# {"gpu": ["NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4"]}
```

```python
#Current version
# https://github.com/huggingface/optimum-benchmark/blob/65fa416fd503cfe9a2be7637ee30c70a4a1f96f1/optimum_benchmark/system_utils.py#L95
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)
    gpus.append(pynvml.nvmlDeviceGetName(handle))
pynvml.nvmlShutdown()
data = {'gpu': gpus}

with open("/tmp/tmp.json", "w") as f:
    dump(data, f)

#Traceback (most recent call last):
#  File "/workspace/debug/debug-pynvml.py", line 14, in <module>
#    dump(data, f)
#  File "/usr/lib/python3.10/json/__init__.py", line 179, in dump
#    for chunk in iterable:
#  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
#    yield from _iterencode_dict(o, _current_indent_level)
#  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
#    yield from chunks
#  File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list
#    yield from chunks
#  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
#    o = _default(o)
#  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
#    raise TypeError(f'Object of type {o.__class__.__name__} '
#TypeError: Object of type bytes is not JSON serializable
#make: *** [Makefile:25: debug-pynvml] Error 1
```
@IlyasMoutawwakil
Copy link
Member

Thanks ! I've never had serialization problems when benchmarking on GPU, is it dependent on the GPU or the pynvml version/distribution (we use the official nvidia-ml-py)

@KeitaW
Copy link
Contributor Author

KeitaW commented Aug 3, 2024

Thank you for your reply, @IlyasMoutawwakil .

It seems that the output format differs in different versions of pynvml. When I used pynvml == 11.5.0 (which is the version I got with pip install nvidia-ml-py as of today), the function outputs strings, not bytes.

pip list | grep pynvml
# pynvml                    11.5.0

Here is a sample code snippet demonstrating this:

import pynvml
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
     handle = pynvml.nvmlDeviceGetHandleByIndex(i)
     gpus.append(pynvml.nvmlDeviceGetName(handle))
pynvml.Shutdown()
data = {'gpu': gpus}

When executed, the output is as follows:

In [2]: data
Out[2]: 
{'gpu': ['NVIDIA L4',
  'NVIDIA L4',
  'NVIDIA L4',
  'NVIDIA L4',
  'NVIDIA L4',
  'NVIDIA L4',
  'NVIDIA L4',
  'NVIDIA L4']} 

The environment I am testing in is nvcr.io/nvidia/pytorch:24.07-py3 (the latest PyTorch container as of today), which includes a slightly older version of pynvml:

pynvml                    11.4.1

In this version, the output format is bytes, as mentioned in the previous message. Allow me to close this PR as the issue does not exist in the latest version. Thanks again!

@KeitaW KeitaW closed this Aug 3, 2024
@KeitaW KeitaW deleted the patch-1 branch August 3, 2024 23:11
@IlyasMoutawwakil
Copy link
Member

we can add a change in that line where the output is decoded or not depending on its type (str or bytes).

@KeitaW KeitaW restored the patch-1 branch August 5, 2024 08:24
@KeitaW KeitaW reopened this Aug 5, 2024
@KeitaW
Copy link
Contributor Author

KeitaW commented Aug 5, 2024

Sure thing. Updated version should work for both cases, as shown in the following example:

import pynvml
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
     handle = pynvml.nvmlDeviceGetHandleByIndex(i)
     gpu = pynvml.nvmlDeviceGetName(handle)
     gpu = gpu.decode("utf-8") if isinstance(gpu, bytes) else gpu # Older pynvml may rutern bytes 
     gpus.append(gpu)     
pynvml.nvmlShutdown()
data = {'gpu': gpus}
print(data)
$ pip list | grep pynvml && python test_pynvml.py 
pynvml                    11.5.0
{'gpu': ['NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4']}
 pip list | grep pynvml && python test_pynvml.py 
pynvml                    11.4.1
{'gpu': ['NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4']}

@IlyasMoutawwakil IlyasMoutawwakil merged commit fd1d0f8 into huggingface:main Aug 15, 2024
21 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants