Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Ascend NPU when using ChatTTS to sample the voice of a real speaker #788

Merged
merged 1 commit into from
Oct 21, 2024

Conversation

shen-shanshan
Copy link
Contributor

What does this PR do?

Overview

This PR is a bugfix for Ascend NPU when using ChatTTS to sample the voice of a real speaker.

Environment

  • OS: ubuntu 20.04
  • NPU: Atlas 300T A2
  • CANN: 8.0.RC2
  • torch-npu: 2.1.0.post6
  • torch: 2.1.0

Problem

Complex dtype used in the process of computing MelSpectrogram is not supported in torch_npu now, and we could get a error when sampling the voice of a real speaker.

bug_1

The logs are showed below:

[+0000 20241016 12:19:06] [WARN]  WebUI  | funcs | no ffmpeg installed, use wav file output
[+0000 20241016 12:19:06] [INFO]  WebUI  | webui | loading ChatTTS model...
[+0000 20241016 12:19:06] [INFO] ChatTTS | dl | checking assets...
/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/gradio/analytics.py:106: UserWarning: IMPORTANT: You are using gradio version 4.44.0, however version 5.0.1 is available, please upgrade. 
--------
  warnings.warn(
[+0000 20241016 12:19:10] [INFO] ChatTTS | dl | all assets are already latest.
[W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
[+0000 20241016 12:19:16] [INFO] ChatTTS | core | use device npu:0
/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
[+0000 20241016 12:19:17] [INFO] ChatTTS | core | vocos loaded.
[+0000 20241016 12:19:17] [INFO] ChatTTS | core | dvae loaded.
[+0000 20241016 12:19:18] [INFO] ChatTTS | core | embed loaded.
[+0000 20241016 12:19:18] [INFO] ChatTTS | core | gpt loaded.
[+0000 20241016 12:19:18] [INFO] ChatTTS | core | speaker loaded.
[+0000 20241016 12:19:18] [INFO] ChatTTS | core | decoder loaded.
[+0000 20241016 12:19:18] [INFO] ChatTTS | core | tokenizer loaded.
[+0000 20241016 12:19:18] [WARN]  WebUI  | funcs | Package nemo_text_processing not found!
[+0000 20241016 12:19:18] [WARN]  WebUI  | funcs | Run: conda install -c conda-forge pynini=2.1.5 && pip install nemo_text_processing
[+0000 20241016 12:19:18] [WARN]  WebUI  | funcs | Package WeTextProcessing not found!
[+0000 20241016 12:19:18] [WARN]  WebUI  | funcs | Run: conda install -c conda-forge pynini=2.1.5 && pip install WeTextProcessing
[+0000 20241016 12:19:18] [INFO]  WebUI  | webui | Models loaded successfully.
Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread
    return await future
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 914, in run
    result = context.run(func, *args)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/sss/github/ChatTTS/examples/web/funcs.py", line 118, in on_upload_sample_audio
    spk_smp = chat.sample_audio_speaker(sample_audio)
  File "/home/sss/github/ChatTTS/ChatTTS/core.py", line 163, in sample_audio_speaker
    return self.speaker.encode_prompt(self.dvae.sample_audio(wav))
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sss/github/ChatTTS/ChatTTS/model/dvae.py", line 296, in sample_audio
    return self(wav, "encode").squeeze_(0)
  File "/home/sss/github/ChatTTS/ChatTTS/model/dvae.py", line 252, in __call__
    return super().__call__(inp, mode)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sss/github/ChatTTS/ChatTTS/model/dvae.py", line 259, in forward
    mel = self.preprocessor_mel(inp)
  File "/home/sss/github/ChatTTS/ChatTTS/model/dvae.py", line 199, in __call__
    return super().__call__(audio)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sss/github/ChatTTS/ChatTTS/model/dvae.py", line 203, in forward
    mel: torch.Tensor = self.mel_spec(audio)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 619, in forward
    specgram = self.spectrogram(waveform)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 110, in forward
    return F.spectrogram(
  File "/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torchaudio/functional/functional.py", line 146, in spectrogram
    return spec_f.abs()
RuntimeError: call aclnnAbs failed, detail:EZ1001: [PID: 271329] 2024-10-16-12:19:57.030.507 self not implemented for DT_COMPLEX64, should be in dtype support list [DT_DOUBLE,DT_FLOAT,DT_FLOAT16,DT_INT64,DT_INT32,DT_INT16,DT_INT8,DT_UINT8,DT_BOOL,DT_BFLOAT16,].

Solution

Therefore, we put this audio data and MelSpectrogram network on CPU instead of NPU, the modifications are showed below:

    def forward(self, audio: torch.Tensor) -> torch.Tensor:
+       if "npu" in str(self.device):
+           # Computation of MelSpectrogram on npu is not supported now, use cpu fallback.
+           audio = audio.to(torch.device("cpu"))
+           self.mel_spec.to(torch.device("cpu"))
+           mel: torch.Tensor = self.mel_spec(audio)
+           mel = mel.to(self.device)
+       else:
            audio = audio.to(self.device)
            mel: torch.Tensor = self.mel_spec(audio)
        features = torch.log(torch.clip(mel, min=1e-5))
        return features

After modification, we can successfully sample a real speaker's voice:

bug_2

bug_3

The logs are showed below:

[+0000 20241016 12:32:40] [WARN]  WebUI  | funcs | no ffmpeg installed, use wav file output
[+0000 20241016 12:32:40] [INFO]  WebUI  | webui | loading ChatTTS model...
[+0000 20241016 12:32:40] [INFO] ChatTTS | dl | checking assets...
/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/gradio/analytics.py:106: UserWarning: IMPORTANT: You are using gradio version 4.44.0, however version 5.0.1 is available, please upgrade. 
--------
  warnings.warn(
[+0000 20241016 12:32:44] [INFO] ChatTTS | dl | all assets are already latest.
[W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
[+0000 20241016 12:32:50] [INFO] ChatTTS | core | use device npu:0
/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
[+0000 20241016 12:32:50] [INFO] ChatTTS | core | vocos loaded.
[+0000 20241016 12:32:51] [INFO] ChatTTS | core | dvae loaded.
[+0000 20241016 12:32:51] [INFO] ChatTTS | core | embed loaded.
[+0000 20241016 12:32:52] [INFO] ChatTTS | core | gpt loaded.
[+0000 20241016 12:32:52] [INFO] ChatTTS | core | speaker loaded.
[+0000 20241016 12:32:52] [INFO] ChatTTS | core | decoder loaded.
[+0000 20241016 12:32:52] [INFO] ChatTTS | core | tokenizer loaded.
[+0000 20241016 12:32:52] [WARN]  WebUI  | funcs | Package nemo_text_processing not found!
[+0000 20241016 12:32:52] [WARN]  WebUI  | funcs | Run: conda install -c conda-forge pynini=2.1.5 && pip install nemo_text_processing
[+0000 20241016 12:32:52] [WARN]  WebUI  | funcs | Package WeTextProcessing not found!
[+0000 20241016 12:32:52] [WARN]  WebUI  | funcs | Run: conda install -c conda-forge pynini=2.1.5 && pip install WeTextProcessing
[+0000 20241016 12:32:52] [INFO]  WebUI  | webui | Models loaded successfully.
Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.
/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:109: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
  offset = torch.where(self._levels % 2 == 0, 0.5, 0.0)
/home/sss/bin/miniconda/miniconda3/envs/chattts_2/lib/python3.10/site-packages/numba/cpython/hashing.py:482: UserWarning: FNV hashing is not implemented in Numba. See PEP 456 https://www.python.org/dev/peps/pep-0456/ for rationale over not using FNV. Numba will continue to work, but hashes for built in types will be computed using siphash24. This will permit e.g. dictionaries to continue to behave as expected, however anything relying on the value of the hash opposed to hash as a derived property is likely to not work as expected.
  warnings.warn(msg)
text:   0%|| 1/384(max) [00:00,  4.30it/s]We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
text:  19%|████████████████████████████▉                                                                                                                           | 73/384(max) [00:03, 22.51it/s]
code:  23%|█████████████████████████████████▊                                                                                                                    | 461/2048(max) [00:20, 22.64it/s]

ChatTTS/model/dvae.py Outdated Show resolved Hide resolved
@fumiama fumiama added bug Something isn't working enhancement New feature or request labels Oct 17, 2024
ChatTTS/core.py Outdated Show resolved Hide resolved
ChatTTS/core.py Outdated Show resolved Hide resolved
@shen-shanshan
Copy link
Contributor Author

已修改😄, @fumiama

Copy link
Member

@fumiama fumiama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fumiama fumiama merged commit 0ec82fe into 2noise:dev Oct 21, 2024
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants