You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
There are use cases that have been difficult to accommodate with our custom classloader because they rely on custom classloading too. Typically any multi tenant server such as Connect #11902 . HiveThriftServer, collaborative notebook environments such as Databricks etc
#11665 shows the direction how we can replace the current JarURL-based classloader with explicitly generated package names
So in the most naive form we consider all classes in the sql-plugin module as requiring Shimming for API or ABI compatibility reasons
A class like com.nvidia.spark.rapids.GpuJsonTuple during the Maven generate-sources phase becomes something like
spark351.com.nvidia.spark.rapids.GpuJsonTuple. If a class is loaded using the reflection mechanism, such callsites need to be adjusted to also be processed to reference the shimifiied package name.
Our current approach is to presume every class needs shimming too, however, then we rely on binary-dedupe to catch where shimming is not required to avoid jar bloat. This automatic dedupping will no longer be possible because the bytecode will never be bitwise-identical across shims given the difference in package names.
Is your feature request related to a problem? Please describe.
There are use cases that have been difficult to accommodate with our custom classloader because they rely on custom classloading too. Typically any multi tenant server such as Connect #11902 . HiveThriftServer, collaborative notebook environments such as Databricks etc
Our standard answer to issues like this is to recommend building a custom single shim jar for the target environment without relying on parallel worlds https://github.com/NVIDIA/spark-rapids/blob/branch-25.02/CONTRIBUTING.md#building-a-distribution-for-a-single-spark-release
Describe the solution you'd like
#11665 shows the direction how we can replace the current JarURL-based classloader with explicitly generated package names
So in the most naive form we consider all classes in the sql-plugin module as requiring Shimming for API or ABI compatibility reasons
A class like
com.nvidia.spark.rapids.GpuJsonTuple
during the Mavengenerate-sources
phase becomes something likespark351.com.nvidia.spark.rapids.GpuJsonTuple
. If a class is loaded using the reflection mechanism, such callsites need to be adjusted to also be processed to reference the shimifiied package name.Our current approach is to presume every class needs shimming too, however, then we rely on binary-dedupe to catch where shimming is not required to avoid jar bloat. This automatic dedupping will no longer be possible because the bytecode will never be bitwise-identical across shims given the difference in package names.
Thus we should work on improving the code discipline of reducing the surface to shimmable code. The current content of the spark-shared folder in the rapids-4-spark jar and https://github.com/NVIDIA/spark-rapids/blob/branch-25.02/docs/dev/shims.md#how-to-externalize-an-internal-class-as-a-compile-time-dependency should gives a clue how to reduce the unnecessarily shimmed classes.
Describe alternatives you've considereds
The text was updated successfully, but these errors were encountered: