apply diff main

Signed-off-by: Suraj Aralihalli <[email protected]>
SurajAralihalli · Dec 18, 2023 · a457a74 · a457a74
1 parent 3db58bd
commit a457a74
Show file tree

Hide file tree

Showing 15 changed files with 927 additions and 359 deletions.
diff --git a/docs/additional-functionality/advanced_configs.md b/docs/additional-functionality/advanced_configs.md
@@ -48,7 +48,7 @@ Name | Description | Default Value | Applicable at
 <a name="python.memory.gpu.pooling.enabled"></a>spark.rapids.python.memory.gpu.pooling.enabled|Should RMM in Python workers act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. When not specified, It will honor the value of config 'spark.rapids.memory.gpu.pooling.enabled'|None|Runtime
 <a name="shuffle.enabled"></a>spark.rapids.shuffle.enabled|Enable or disable the RAPIDS Shuffle Manager at runtime. The [RAPIDS Shuffle Manager](https://docs.nvidia.com/spark-rapids/user-guide/latest/additional-functionality/rapids-shuffle.html) must already be configured. When set to `false`, the built-in Spark shuffle will be used. |true|Runtime
 <a name="shuffle.mode"></a>spark.rapids.shuffle.mode|RAPIDS Shuffle Manager mode. "MULTITHREADED": shuffle file writes and reads are parallelized using a thread pool. "UCX": (requires UCX installation) uses accelerated transports for transferring shuffle blocks. "CACHE_ONLY": use when running a single executor, for short-circuit cached shuffle (for testing purposes).|MULTITHREADED|Startup
-<a name="shuffle.multiThreaded.maxBytesInFlight"></a>spark.rapids.shuffle.multiThreaded.maxBytesInFlight|The size limit, in bytes, that the RAPIDS shuffle manager configured in "MULTITHREADED" mode will allow to be deserialized concurrently per task. This is also the maximum amount of memory that will be used per task. This should be set larger than Spark's default maxBytesInFlight (48MB). The larger this setting is, the more compressed shuffle chunks are processed concurrently. In practice, care needs to be taken to not go over the amount of off-heap memory that Netty has available. See https://github.com/NVIDIA/spark-rapids/issues/9153.|134217728|Startup
+<a name="shuffle.multiThreaded.maxBytesInFlight"></a>spark.rapids.shuffle.multiThreaded.maxBytesInFlight|The size limit, in bytes, that the RAPIDS shuffle manager configured in "MULTITHREADED" mode will allow to be serialized or deserialized concurrently per task. This is also the maximum amount of memory that will be used per task. This should be set larger than Spark's default maxBytesInFlight (48MB). The larger this setting is, the more compressed shuffle chunks are processed concurrently. In practice, care needs to be taken to not go over the amount of off-heap memory that Netty has available. See https://github.com/NVIDIA/spark-rapids/issues/9153.|134217728|Startup
 <a name="shuffle.multiThreaded.reader.threads"></a>spark.rapids.shuffle.multiThreaded.reader.threads|The number of threads to use for reading shuffle blocks per executor in the RAPIDS shuffle manager configured in "MULTITHREADED" mode. There are two special values: 0 = feature is disabled, falls back to Spark built-in shuffle reader; 1 = our implementation of Spark's built-in shuffle reader with extra metrics.|20|Startup
 <a name="shuffle.multiThreaded.writer.threads"></a>spark.rapids.shuffle.multiThreaded.writer.threads|The number of threads to use for writing shuffle blocks per executor in the RAPIDS shuffle manager configured in "MULTITHREADED" mode. There are two special values: 0 = feature is disabled, falls back to Spark built-in shuffle writer; 1 = our implementation of Spark's built-in shuffle writer with extra metrics.|20|Startup
 <a name="shuffle.transport.earlyStart"></a>spark.rapids.shuffle.transport.earlyStart|Enable early connection establishment for RAPIDS Shuffle|true|Startup
@@ -337,6 +337,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
 <a name="sql.expression.SparkPartitionID"></a>spark.rapids.sql.expression.SparkPartitionID|`spark_partition_id`|Returns the current partition id|true|None|
 <a name="sql.expression.SpecifiedWindowFrame"></a>spark.rapids.sql.expression.SpecifiedWindowFrame| |Specification of the width of the group (or "frame") of input rows around which a window function is evaluated|true|None|
 <a name="sql.expression.Sqrt"></a>spark.rapids.sql.expression.Sqrt|`sqrt`|Square root|true|None|
+<a name="sql.expression.Stack"></a>spark.rapids.sql.expression.Stack|`stack`|Separates expr1, ..., exprk into n rows.|true|None|
 <a name="sql.expression.StartsWith"></a>spark.rapids.sql.expression.StartsWith| |Starts with|true|None|
 <a name="sql.expression.StringInstr"></a>spark.rapids.sql.expression.StringInstr|`instr`|Instr string operator|true|None|
 <a name="sql.expression.StringLPad"></a>spark.rapids.sql.expression.StringLPad|`lpad`|Pad a string on the left|true|None|
@@ -350,6 +351,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
 <a name="sql.expression.StringTrim"></a>spark.rapids.sql.expression.StringTrim|`trim`|StringTrim operator|true|None|
 <a name="sql.expression.StringTrimLeft"></a>spark.rapids.sql.expression.StringTrimLeft|`ltrim`|StringTrimLeft operator|true|None|
 <a name="sql.expression.StringTrimRight"></a>spark.rapids.sql.expression.StringTrimRight|`rtrim`|StringTrimRight operator|true|None|
+<a name="sql.expression.StructsToJson"></a>spark.rapids.sql.expression.StructsToJson|`to_json`|Converts structs to JSON text format|false|This is disabled by default because to_json support is experimental. See compatibility guide for more information.|
 <a name="sql.expression.Substring"></a>spark.rapids.sql.expression.Substring|`substr`, `substring`|Substring operator|true|None|
 <a name="sql.expression.SubstringIndex"></a>spark.rapids.sql.expression.SubstringIndex|`substring_index`|substring_index operator|true|None|
 <a name="sql.expression.Subtract"></a>spark.rapids.sql.expression.Subtract|`-`|Subtraction|true|None|
@@ -383,6 +385,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
 <a name="sql.expression.Last"></a>spark.rapids.sql.expression.Last|`last`, `last_value`|last aggregate operator|true|None|
 <a name="sql.expression.Max"></a>spark.rapids.sql.expression.Max|`max`|Max aggregate operator|true|None|
 <a name="sql.expression.Min"></a>spark.rapids.sql.expression.Min|`min`|Min aggregate operator|true|None|
+<a name="sql.expression.Percentile"></a>spark.rapids.sql.expression.Percentile|`percentile`|Aggregation computing exact percentile|true|None|
 <a name="sql.expression.PivotFirst"></a>spark.rapids.sql.expression.PivotFirst| |PivotFirst operator|true|None|
 <a name="sql.expression.StddevPop"></a>spark.rapids.sql.expression.StddevPop|`stddev_pop`|Aggregation computing population standard deviation|true|None|
 <a name="sql.expression.StddevSamp"></a>spark.rapids.sql.expression.StddevSamp|`stddev_samp`, `std`, `stddev`|Aggregation computing sample standard deviation|true|None|

diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.rocky_no_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.rocky_no_rdma
@@ -17,23 +17,26 @@
 #
 # The parameters are: 
 #   - CUDA_VER: 11.8.0 by default
-#   - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and
-#                               CUDA runtime from the UCX github repo.
-#                               See: https://github.com/openucx/ucx/releases/
+#   - UCX_VER, UCX_CUDA_VER, and UCX_ARCH: 
+#       Used to pick a package matching a specific UCX version and
+#       CUDA runtime from the UCX github repo.
+#       See: https://github.com/openucx/ucx/releases/
 #   - ROCKY_VER: Rocky Linux OS version
 
 ARG CUDA_VER=11.8.0
-ARG UCX_VER=1.14.0
+ARG UCX_VER=1.15.0
 ARG UCX_CUDA_VER=11
+ARG UCX_ARCH=x86_64
 ARG ROCKY_VER=8
 FROM nvidia/cuda:${CUDA_VER}-runtime-rockylinux${ROCKY_VER}
 ARG UCX_VER
 ARG UCX_CUDA_VER
+ARG UCX_ARCH
 
 RUN yum update -y && yum install -y wget bzip2 numactl-libs libgomp
 RUN ls /usr/lib
 RUN mkdir /tmp/ucx_install && cd /tmp/ucx_install && \
-  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
+  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
   tar -xvf *.bz2 && \
   rpm -i ucx-$UCX_VER*.rpm && \
   rpm -i ucx-cuda-$UCX_VER*.rpm --nodeps && \

diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.rocky_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.rocky_rdma
@@ -17,22 +17,25 @@
 #
 # The parameters are: 
 #   - CUDA_VER: 11.8.0 by default
-#   - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and
-#                               CUDA runtime from the UCX github repo.
-#                               See: https://github.com/openucx/ucx/releases/
+#   - UCX_VER, UCX_CUDA_VER, and UCX_ARCH: 
+#       Used to pick a package matching a specific UCX version and
+#       CUDA runtime from the UCX github repo.
+#       See: https://github.com/openucx/ucx/releases/
 #   - ROCKY_VER: Rocky Linux OS version
 
 ARG CUDA_VER=11.8.0
-ARG UCX_VER=1.14.0
+ARG UCX_VER=1.15.0
 ARG UCX_CUDA_VER=11
+ARG UCX_ARCH=x86_64
 ARG ROCKY_VER=8
 FROM nvidia/cuda:${CUDA_VER}-runtime-rockylinux${ROCKY_VER}
 ARG UCX_VER
 ARG UCX_CUDA_VER
+ARG UCX_ARCH
 
 RUN yum update -y && yum install -y wget bzip2 rdma-core numactl-libs libgomp libibverbs librdmacm
 RUN mkdir /tmp/ucx_install && cd /tmp/ucx_install && \
-  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
+  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
   tar -xvf *.bz2 && \
   rpm -i ucx-$UCX_VER*.rpm && \
   rpm -i ucx-cuda-$UCX_VER*.rpm --nodeps && \

diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_no_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_no_rdma
@@ -17,21 +17,24 @@
 #
 # The parameters are: 
 #   - CUDA_VER: 11.8.0 by default
-#   - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and 
-#                               CUDA runtime from the UCX github repo.
-#                               See: https://github.com/openucx/ucx/releases/
+#   - UCX_VER, UCX_CUDA_VER, and UCX_ARCH: 
+#       Used to pick a package matching a specific UCX version and
+#       CUDA runtime from the UCX github repo.
+#       See: https://github.com/openucx/ucx/releases/
 #   - UBUNTU_VER: 20.04 by default
 #
 
 ARG CUDA_VER=11.8.0
-ARG UCX_VER=1.14.0
+ARG UCX_VER=1.15.0
 ARG UCX_CUDA_VER=11
+ARG UCX_ARCH=x86_64
 ARG UBUNTU_VER=20.04
 
 FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VER}
 ARG UCX_VER
 ARG UCX_CUDA_VER
 ARG UBUNTU_VER
+ARG UCX_ARCH
 
 RUN apt-get update && apt-get install -y gnupg2
 # https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212771
@@ -41,7 +44,7 @@ RUN CUDA_UBUNTU_VER=`echo "$UBUNTU_VER"| sed -s 's/\.//'` && \
 RUN apt update
 RUN apt-get install -y wget
 RUN mkdir /tmp/ucx_install && cd /tmp/ucx_install && \
-  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
-  tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
+  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
+  tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
   apt install -y /tmp/ucx_install/*.deb && \
   rm -rf /tmp/ucx_install
diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_rdma
@@ -20,9 +20,10 @@
 #   - RDMA_CORE_VERSION: Set to 32.1 to match the rdma-core line in the latest 
 #                        released MLNX_OFED 5.x driver
 #   - CUDA_VER: 11.8.0 by default
-#   - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and
-#                               CUDA runtime from the UCX github repo.
-#                               See: https://github.com/openucx/ucx/releases/
+#   - UCX_VER, UCX_CUDA_VER, and UCX_ARCH: 
+#       Used to pick a package matching a specific UCX version and
+#       CUDA runtime from the UCX github repo.
+#       See: https://github.com/openucx/ucx/releases/
 #   - UBUNTU_VER: 20.04 by default
 #
 # The Dockerfile first fetches and builds `rdma-core` to satisfy requirements for
@@ -34,15 +35,17 @@
 
 ARG RDMA_CORE_VERSION=32.1
 ARG CUDA_VER=11.8.0
-ARG UCX_VER=1.14.0
+ARG UCX_VER=1.15.0
 ARG UCX_CUDA_VER=11
+ARG UCX_ARCH=x86_64
 ARG UBUNTU_VER=20.04
 
 # Throw away image to build rdma_core
 FROM ubuntu:${UBUNTU_VER} as rdma_core
 ARG RDMA_CORE_VERSION
 ARG UBUNTU_VER
 ARG CUDA_VER
+ARG UCX_ARCH
 
 RUN apt-get update && apt-get install -y gnupg2
 # https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212771
@@ -61,6 +64,7 @@ RUN tar -xvf *.tar.gz && cd rdma-core*/ && dpkg-buildpackage -b -d
 FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VER}
 ARG UCX_VER
 ARG UCX_CUDA_VER
+ARG UCX_ARCH
 ARG UBUNTU_VER
 
 RUN mkdir /tmp/ucx_install
@@ -70,7 +74,7 @@ COPY --from=rdma_core /*.deb /tmp/ucx_install/
 RUN apt update
 RUN apt-get install -y wget
 RUN cd /tmp/ucx_install && \
-  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
-  tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
+  wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
+  tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
   apt install -y /tmp/ucx_install/*.deb && \
   rm -rf /tmp/ucx_install