Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CANN] Adapt to dynamically loadable backends mechanism #9970

Merged

Conversation

leo-pony
Copy link
Contributor

@leo-pony leo-pony commented Oct 21, 2024

  1. Dynamically loadable backends framework has been added in PR(ggml-backend : add device and backend reg interfaces #9707). Currently, most backends have implemented these interfaces. CANN backend needs to adapt to this mechanism.
    The corresponding Feature Request PR is Feature Request: [CANN] backend adapts to llama.cpp dynamic backend loading mechanism #9862.
  2. Fix the bug:"Inference running result is garbled in debug running model for LM models whose type is Q4_0 class". Issue number Bug: [CANN] inference running result is garbled in debug running model for LM models who's type is Q4_0 class #9979

Function is normal:
image
Performance is the same with before:
image

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 21, 2024
src/llama.cpp Outdated
Copy link
Collaborator

@slaren slaren Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the instances of GGML_USE_CANN should be removed from this file, there are still a few remaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, I have removed all the instances of GGML_USE_CANN from llama.cpp.

Comment on lines 591 to 598

#ifdef GGML_USE_AMX
register_backend(ggml_backend_amx_reg());
#endif

// TODO: kompute, cann
#ifdef GGML_USE_CANN
register_backend(ggml_backend_cann_reg());
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add extra lines, use the same format as the rest of the backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, I have removed the blank lines and the formatting is consistent with the rest of the backends. Function is normal after removing.

@@ -33,6 +33,9 @@ extern "C" {
* @brief Maximum number of CANN devices supported.
*/
#define GGML_CANN_MAX_DEVICES 16
#define GGML_CANN_NAME "CANN"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this macro was taken from the implementation in the CUDA backend, but the reason it exists there is because the same code is used to build the ROCm and MUSA backends, so the name of the backend may change depending on the build flags, but I don't think this is necessary in the CANN backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, this macro is no longer exposed to llama. I have moved this macro to the cann implementation file.

@leo-pony leo-pony force-pushed the cann_adapt_backend_dynamically_loadable branch from 6d36f48 to abf5be4 Compare October 22, 2024 02:27
@hipudding hipudding added the Ascend NPU issues specific to Ascend NPUs label Oct 22, 2024
@hipudding
Copy link
Collaborator

The review comment has been fixed. Looks good to me. Approved

@hipudding hipudding merged commit 6b84473 into ggerganov:master Oct 22, 2024
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: [CANN] inference running result is garbled in debug running model for LM models who's type is Q4_0 class
3 participants