Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization INT8 SDXL-Turbo on Windows 11 fails? #71

Open
joansc opened this issue Sep 13, 2024 · 1 comment
Open

Quantization INT8 SDXL-Turbo on Windows 11 fails? #71

joansc opened this issue Sep 13, 2024 · 1 comment

Comments

@joansc
Copy link

joansc commented Sep 13, 2024

Hello everyone,

Following the Diffusion Models Quanization with Model Optimizer, after this command:

python quantize.py --model sdxl-turbo --format int8 --batch-size 2 --calib-size 32 --collect-method min-mean --percentile 1.0 --alpha 0.8 --quant-level 3.0 --n-steps 4 --quantized-torch-ckpt-save-path ./sdxl-turbo_int8.pt --onnx-dir sdxl-turbo_onnx

I get:

Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:00<00:00, 14.98it/s] Inserted 2942 quantizers C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = torch.nn.functional.scaled_dot_product_attention( 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.84it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.23it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.73it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.62it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.90it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.82it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.92it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.06it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.59it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.53it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.76it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.38it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.87it/s] 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.97it/s] Warning: time_embedding.linear_1 is not calibrated, skip smoothing Warning: time_embedding.linear_2 is not calibrated, skip smoothing Warning: add_embedding.linear_1 is not calibrated, skip smoothing Warning: add_embedding.linear_2 is not calibrated, skip smoothing Warning: down_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.attentions.0.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.attentions.1.proj_out is not calibrated, skip smoothing Warning: down_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: down_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.0.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.0.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.attentions.0.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.1.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.attentions.2.proj_out is not calibrated, skip smoothing Warning: up_blocks.1.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.1.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.1.time_emb_proj is not calibrated, skip smoothing Warning: up_blocks.2.resnets.2.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.attentions.0.proj_out is not calibrated, skip smoothing Warning: mid_block.resnets.0.time_emb_proj is not calibrated, skip smoothing Warning: mid_block.resnets.1.time_emb_proj is not calibrated, skip smoothing Smoothed 711 modules Pipelines loaded with dtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference. Pipelines loaded withdtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16 argument, or use another device for inference. C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1110: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:629: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. if len(inputs) == 0: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\nn\functional.py:2447: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if size_prods == 1: C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:401: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert torch.all(amax >= 0) and not torch.any( Loading extension modelopt_cuda_ext... C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\torch\utils\cpp_extension.py:381: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified warnings.warn(f'Error checking compiler version for {compiler}: {error}') INFO: Could not find files for the given pattern(s). C:\Users\usuario\Documents\WORK\Manso\nuevo_algo_ritmoGit\Tools\tensorrt_model_optimizer\env\Lib\site-packages\modelopt\torch\utils\cpp_extension.py:58: UserWarning: Command '['where', 'cl']' returned non-zero exit status 1. Unable to load extension modelopt_cuda_ext and falling back to CPU version. warnings.warn(

I see that a sdxl-turbo_int8.pt was created along an empty folder sdxl-turbo_onnx. I wonder if inside that folder should be an onnx model which would mean that the quantization command failed...

(Windows 11 rtx4090)

Thanks in advance,

Joan

@zeng121
Copy link

zeng121 commented Oct 29, 2024

I had the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants