-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode output of nvmlDeviceGetName
to avoid JSON serialize issue
#240
Conversation
`nvmlDeviceGetName` returns bytes, which is not JSON serializable. This causes ``` TypeError: Object of type bytes is not JSON serializable ``` error when `hub_utils.py` calls [save_json method](https://github.com/huggingface/optimum-benchmark/blob/65fa416fd503cfe9a2be7637ee30c70a4a1f96f1/optimum_benchmark/hub_utils.py#L45) as it contains the output from this function. ex. ``` bound method PushToHubMixin.to_dict of BenchmarkConfig(name='pytorch_bert', backend=PyTorchConfig(name='pytorch', version='2.4.0a0+3bcc3cddb5.nv24.7', _target_='optimum_benchmark.backends.pytorch.backend.PyTorchBackend', task='fill-mask', library='transformers', model_type='bert', model='bert-base-uncased', processor='bert-base-uncased', device='cuda', device_ids='0', seed=42, inter_op_num_threads=None, intra_op_num_threads=None, model_kwargs={}, processor_kwargs={}, no_weights=True, device_map=None, torch_dtype=None, eval_mode=True, to_bettertransformer=False, low_cpu_mem_usage=None, attn_implementation=None, cache_implementation=None, autocast_enabled=False, autocast_dtype=None, torch_compile=False, torch_compile_target='forward', torch_compile_config={}, quantization_scheme=None, quantization_config={}, deepspeed_inference=False, deepspeed_inference_config={}, peft_type=None, peft_config={}), scenario=InferenceConfig(name='inference', _target_='optimum_benchmark.scenarios.inference.scenario.InferenceScenario', iterations=10, duration=10, warmup_runs=10, input_shapes={'batch_size': 1, 'num_choices': 2, 'sequence_length': 128}, new_tokens=None, memory=True, latency=True, energy=False, forward_kwargs={}, generate_kwargs={}, call_kwargs={}), launcher=ProcessConfig(name='process', _target_='optimum_benchmark.launchers.process.launcher.ProcessLauncher', device_isolation=True, device_isolation_action='warn', numactl=False, numactl_kwargs={}, start_method='spawn'), environment={'cpu': ' AMD EPYC 7R13 Processor', 'cpu_count': 192, 'cpu_ram_mb': 781912.027136, 'system': 'Linux', 'machine': 'x86_64', 'platform': 'Linux-5.15.0-1063-aws-x86_64-with-glibc2.35', 'processor': 'x86_64', 'python_version': '3.10.12', 'gpu': [b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4', b'NVIDIA L4'], 'gpu_count': 8, 'gpu_vram_mb': 193223196672, 'optimum_benchmark_version': '0.4.0', 'optimum_benchmark_commit': None, 'transformers_version': '4.43.3', 'transformers_commit': None, 'accelerate_version': '0.33.0', 'accelerate_commit': None, 'diffusers_version': None, 'diffusers_commit': None, 'optimum_version': None, 'optimum_commit': None, 'timm_version': None, 'timm_commit': None, 'peft_version': None, 'peft_commit': None})> ``` I believe that the issue can be avoided simply by decoding the bytes into string as shown in the following example. ```python from json import dump, load import pynvml # Proposed version gpus = [] pynvml.nvmlInit() for i in range(pynvml.nvmlDeviceGetCount()): handle = pynvml.nvmlDeviceGetHandleByIndex(i) gpus.append(pynvml.nvmlDeviceGetName(handle).decode("utf-8")) pynvml.nvmlShutdown() data = {'gpu': gpus} with open("/tmp/tmp.json", "w") as f: dump(data, f) with open("/tmp/tmp.json", "r") as f: print(f.read()) # {"gpu": ["NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4", "NVIDIA L4"]} ``` ```python #Current version # https://github.com/huggingface/optimum-benchmark/blob/65fa416fd503cfe9a2be7637ee30c70a4a1f96f1/optimum_benchmark/system_utils.py#L95 gpus = [] pynvml.nvmlInit() for i in range(pynvml.nvmlDeviceGetCount()): handle = pynvml.nvmlDeviceGetHandleByIndex(i) gpus.append(pynvml.nvmlDeviceGetName(handle)) pynvml.nvmlShutdown() data = {'gpu': gpus} with open("/tmp/tmp.json", "w") as f: dump(data, f) #Traceback (most recent call last): # File "/workspace/debug/debug-pynvml.py", line 14, in <module> # dump(data, f) # File "/usr/lib/python3.10/json/__init__.py", line 179, in dump # for chunk in iterable: # File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode # yield from _iterencode_dict(o, _current_indent_level) # File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict # yield from chunks # File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list # yield from chunks # File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode # o = _default(o) # File "/usr/lib/python3.10/json/encoder.py", line 179, in default # raise TypeError(f'Object of type {o.__class__.__name__} ' #TypeError: Object of type bytes is not JSON serializable #make: *** [Makefile:25: debug-pynvml] Error 1 ```
Thanks ! I've never had serialization problems when benchmarking on GPU, is it dependent on the GPU or the pynvml version/distribution (we use the official |
Thank you for your reply, @IlyasMoutawwakil . It seems that the output format differs in different versions of pynvml. When I used pip list | grep pynvml
# pynvml 11.5.0 Here is a sample code snippet demonstrating this: import pynvml
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
handle = pynvml.nvmlDeviceGetHandleByIndex(i)
gpus.append(pynvml.nvmlDeviceGetName(handle))
pynvml.Shutdown()
data = {'gpu': gpus} When executed, the output is as follows: In [2]: data
Out[2]:
{'gpu': ['NVIDIA L4',
'NVIDIA L4',
'NVIDIA L4',
'NVIDIA L4',
'NVIDIA L4',
'NVIDIA L4',
'NVIDIA L4',
'NVIDIA L4']} The environment I am testing in is
In this version, the output format is bytes, as mentioned in the previous message. Allow me to close this PR as the issue does not exist in the latest version. Thanks again! |
we can add a change in that line where the output is decoded or not depending on its type (str or bytes). |
Sure thing. Updated version should work for both cases, as shown in the following example: import pynvml
gpus = []
pynvml.nvmlInit()
for i in range(pynvml.nvmlDeviceGetCount()):
handle = pynvml.nvmlDeviceGetHandleByIndex(i)
gpu = pynvml.nvmlDeviceGetName(handle)
gpu = gpu.decode("utf-8") if isinstance(gpu, bytes) else gpu # Older pynvml may rutern bytes
gpus.append(gpu)
pynvml.nvmlShutdown()
data = {'gpu': gpus}
print(data) $ pip list | grep pynvml && python test_pynvml.py
pynvml 11.5.0
{'gpu': ['NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4']} pip list | grep pynvml && python test_pynvml.py
pynvml 11.4.1
{'gpu': ['NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4', 'NVIDIA L4']} |
nvmlDeviceGetName
returns bytes, which is not JSON serializable. This causeserror when
hub_utils.py
callssave_json method
(optimum-benchmark/optimum_benchmark/hub_utils.py
Line 45 in 65fa416
optimum-benchmark/optimum_benchmark/system_utils.py
Line 98 in 65fa416
ex (see 'gpu' key value).
I believe that the issue can be avoided simply by decoding the bytes into string as shown in the following example.