Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V port implementation #368

Open
k-kisielak opened this issue Oct 4, 2024 · 4 comments
Open

RISC-V port implementation #368

k-kisielak opened this issue Oct 4, 2024 · 4 comments

Comments

@k-kisielak
Copy link

I want to announce that we are currently working on RISC-V implementation using vector extension.
As we I mean: Samsung R&D Poland + partially RISE Project engineers

Code is being developed on feature branch at: https://github.com/k-kisielak/opus/tree/rvv_impl
Implementation is in early stage and is not deemed for merging in current form.

Current state:
We started with implementing parts of silk module:

  • basing on existing SSE implementation we prepared one using RISC-V vector intrinsics

Items under development:

  • tests are to be performed, both functional and performance ones - to test those changes we have access to RISC-V devices like BananaPI F3, we have experience with testing RVV under qemu
  • we are looking into expanding github action configs to allow testing RISC-V version (as of now only on linux)
  • we are integrating more RVV implementations, namely for burg modified algorithm

We would gladly accept any feedback, thank you!

@petterreinholdtsen
Copy link

petterreinholdtsen commented Oct 4, 2024 via email

@jmvalin
Copy link
Member

jmvalin commented Oct 4, 2024

This is indeed good news. Here's a few questions/suggestions on how to proceed:

  1. Are you targeting fixed-point or floating point or both? On some devices one is faster than the other, but it's not always the same. All things being equal float is generally preferable because it has more features.
  2. When optimizing, it's best to start with just defining optimized version of some of the basic operations. For example, you can look at celt/arm/fixed_armv4.h and silk/arm/macros_armv5e.h. That's an easy way to optimize many functions at once without changing too much code.
  3. If optimizing entire function, replicating something similar to SSE/ARM like you're doing is a good idea. Make sure you implement the CHECK_ASM scheme for debugging with --enable-check-asm
  4. Splitting into small patches that each add functionality without regressions (e.g. optimize one function and use it) is the best way to land these changes.

@k-kisielak
Copy link
Author

k-kisielak commented Oct 7, 2024

Ad.1 We are opting for 'somewhat capable' devices with vector units operating on floats. So fixed point is less important.
Ad. 2-4 Thank you for your advice, as of now we focused on bulky functions yet we will look for ones which could provide speedup with small changes exploiting vector extension capabilities.

@jmvalin
Copy link
Member

jmvalin commented Oct 11, 2024

The first thing I would check is whether you want to set OPUS_FAST_INT64 to 0 or 1 in celt/arch.h. For float, it will not make a difference for CELT, but it will have an impact on SILK. So you might want to pick whichever value makes SILK run the fastest before you optimize (because that changes the exact behaviour).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants