Quantization INT8 SDXL-Turbo on Windows 11 fails? #71

joansc · 2024-09-13T09:35:42Z

Hello everyone,

Following the Diffusion Models Quanization with Model Optimizer, after this command:

python quantize.py --model sdxl-turbo --format int8 --batch-size 2 --calib-size 32 --collect-method min-mean --percentile 1.0 --alpha 0.8 --quant-level 3.0 --n-steps 4 --quantized-torch-ckpt-save-path ./sdxl-turbo_int8.pt --onnx-dir sdxl-turbo_onnx

I get:

Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:00<00:00, 14.98it/s] Inserted 2942 quantizers C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = torch.nn.functional.scaled_dot_product_attention( 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.84it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.23it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.73it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.62it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.90it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.92it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.06it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.59it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.53it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.76it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.38it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.87it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.97it/s] Warning: time_embedding.linear_1 is not calibrated, skip smoothing Warning: time_embedding.linear_2 is not calibrated, skip smoothing Warning: add_embedding.linear_1 is not calibrated, skip smoothing Warning: add_embedding.linear_2 is not calibrated, skip smoothing Warning: down_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.attentions.0.proj_out is not calibrated, skip smoothing Warning: mid_block.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.resnets.1.time_emb_proj is not calibrated, skip smoothing Smoothed 711 modules Pipelines loaded with dtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16 argument, or use another device for inference. C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1110: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:629: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. if len(inputs) == 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\nn\functional.py:2447: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if size_prods == 1: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:401: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.all(amax >= 0) and not torch.any( Loading extension modelopt_cuda_ext... C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\utils\cpp_extension.py:381: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified warnings.warn(f'Error checking compiler version for {compiler}: {error}') INFO: Could not find files for the given pattern(s). C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\utils\cpp_extension.py:58: UserWarning: Command '['where', 'cl']' returned non-zero exit status 1. Unable to load extension modelopt_cuda_ext and falling back to CPU version. warnings.warn(

I see that a sdxl-turbo_int8.pt was created along an empty folder sdxl-turbo_onnx. I wonder if inside that folder should be an onnx model which would mean that the quantization command failed...

(Windows 11 rtx4090)

Thanks in advance,

Joan

The text was updated successfully, but these errors were encountered:

zeng121 · 2024-10-29T07:09:39Z

I had the same problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization INT8 SDXL-Turbo on Windows 11 fails? #71

Quantization INT8 SDXL-Turbo on Windows 11 fails? #71

joansc commented Sep 13, 2024

zeng121 commented Oct 29, 2024

Quantization INT8 SDXL-Turbo on Windows 11 fails? #71

Quantization INT8 SDXL-Turbo on Windows 11 fails? #71

Comments

joansc commented Sep 13, 2024

zeng121 commented Oct 29, 2024