You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:00<00:00, 14.98it/s] Inserted 2942 quantizers C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = torch.nn.functional.scaled_dot_product_attention( 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.84it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.23it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.73it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.62it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.90it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.92it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.06it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.59it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.53it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.76it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.38it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.87it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.97it/s] Warning: time_embedding.linear_1 is not calibrated, skip smoothing Warning: time_embedding.linear_2 is not calibrated, skip smoothing Warning: add_embedding.linear_1 is not calibrated, skip smoothing Warning: add_embedding.linear_2 is not calibrated, skip smoothing Warning: down_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.attentions.0.proj_out is not calibrated, skip smoothing Warning: mid_block.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.resnets.1.time_emb_proj is not calibrated, skip smoothing Smoothed 711 modules Pipelines loaded with dtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16 argument, or use another device for inference. C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1110: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:629: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. if len(inputs) == 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\nn\functional.py:2447: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if size_prods == 1: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:401: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.all(amax >= 0) and not torch.any( Loading extension modelopt_cuda_ext... C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\utils\cpp_extension.py:381: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified warnings.warn(f'Error checking compiler version for {compiler}: {error}') INFO: Could not find files for the given pattern(s). C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\utils\cpp_extension.py:58: UserWarning: Command '['where', 'cl']' returned non-zero exit status 1. Unable to load extension modelopt_cuda_ext and falling back to CPU version. warnings.warn(
I see that a sdxl-turbo_int8.pt was created along an empty folder sdxl-turbo_onnx. I wonder if inside that folder should be an onnx model which would mean that the quantization command failed...
(Windows 11 rtx4090)
Thanks in advance,
Joan
The text was updated successfully, but these errors were encountered:
Hello everyone,
Following the Diffusion Models Quanization with Model Optimizer, after this command:
python quantize.py --model sdxl-turbo --format int8 --batch-size 2 --calib-size 32 --collect-method min-mean --percentile 1.0 --alpha 0.8 --quant-level 3.0 --n-steps 4 --quantized-torch-ckpt-save-path ./sdxl-turbo_int8.pt --onnx-dir sdxl-turbo_onnx
I get:
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:00<00:00, 14.98it/s] Inserted 2942 quantizers C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = torch.nn.functional.scaled_dot_product_attention( 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.84it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.23it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.73it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.62it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.90it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.92it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.06it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.59it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.53it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.76it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.38it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.87it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.97it/s] Warning: time_embedding.linear_1 is not calibrated, skip smoothing Warning: time_embedding.linear_2 is not calibrated, skip smoothing Warning: add_embedding.linear_1 is not calibrated, skip smoothing Warning: add_embedding.linear_2 is not calibrated, skip smoothing Warning: down_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.attentions.0.proj_out is not calibrated, skip smoothing Warning: mid_block.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.resnets.1.time_emb_proj is not calibrated, skip smoothing Smoothed 711 modules Pipelines loaded with
dtype=torch.float16cannot run with
cpudevice. It is not recommended to move them to
cpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for
float16operations on this device in PyTorch. Please, remove the
torch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded with
dtype=torch.float16cannot run with
cpudevice. It is not recommended to move them to
cpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for
float16operations on this device in PyTorch. Please, remove the
torch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded with
dtype=torch.float16cannot run with
cpudevice. It is not recommended to move them to
cpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for
float16operations on this device in PyTorch. Please, remove the
torch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded with
dtype=torch.float16cannot run with
cpudevice. It is not recommended to move them to
cpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for
float16operations on this device in PyTorch. Please, remove the
torch_dtype=torch.float16argument, or use another device for inference. C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1110: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:629: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. if len(inputs) == 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\nn\functional.py:2447: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if size_prods == 1: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:401: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.all(amax >= 0) and not torch.any( Loading extension modelopt_cuda_ext... C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\utils\cpp_extension.py:381: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified warnings.warn(f'Error checking compiler version for {compiler}: {error}') INFO: Could not find files for the given pattern(s). C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\utils\cpp_extension.py:58: UserWarning: Command '['where', 'cl']' returned non-zero exit status 1. Unable to load extension modelopt_cuda_ext and falling back to CPU version. warnings.warn(
I see that a sdxl-turbo_int8.pt was created along an empty folder sdxl-turbo_onnx. I wonder if inside that folder should be an onnx model which would mean that the quantization command failed...
(Windows 11 rtx4090)
Thanks in advance,
Joan
The text was updated successfully, but these errors were encountered: