From c819bc0b0d3d1c98e6b36fcafcf184f5bb4b2c2c Mon Sep 17 00:00:00 2001 From: Semyon Date: Thu, 6 Jun 2024 00:13:04 +0200 Subject: [PATCH] Small changes in docs (#512) ## Which issue does this PR close? Closes #503 Closes #191 ## Rationale for this change 1. Provide a way to build Comet from the source on an isolated environments with an access to github.com 2. Update documentation in part, related to compatibility of Spark AQE and Comet Shuffle ## What changes are included in this PR? - Update tuning section about the compatibility of Shuffle and Spark AQE - Add `release-nogit` for building on an isolated environments - Update docs in the section about an installation process Changes to be committed: modified: Makefile modified: docs/source/user-guide/installation.md modified: docs/source/user-guide/tuning.md ## How are these changes tested? I run both `make release` and `make release-nogit`. The first one created properties file in `common/target/classes` but the second did not. The flag `-Dmaven.gitcommitid.skip=true` is described in [this comment](https://github.com/git-commit-id/git-commit-id-maven-plugin/issues/392#issuecomment-432309487). --- Makefile | 3 +++ docs/source/user-guide/installation.md | 6 ++++++ docs/source/user-guide/tuning.md | 2 ++ 3 files changed, 11 insertions(+) diff --git a/Makefile b/Makefile index b9b9707ba..573a7f955 100644 --- a/Makefile +++ b/Makefile @@ -77,6 +77,9 @@ release-linux: clean release: cd core && RUSTFLAGS="-Ctarget-cpu=native" cargo build --release ./mvnw install -Prelease -DskipTests $(PROFILES) +release-nogit: + cd core && RUSTFLAGS="-Ctarget-cpu=native" cargo build --features nightly --release + ./mvnw install -Prelease -DskipTests $(PROFILES) -Dmaven.gitcommitid.skip=true benchmark-%: clean release cd spark && COMET_CONF_DIR=$(shell pwd)/conf MAVEN_OPTS='-Xmx20g' ../mvnw exec:java -Dexec.mainClass="$*" -Dexec.classpathScope="test" -Dexec.cleanupDaemonThreads="false" -Dexec.args="$(filter-out $@,$(MAKECMDGOALS))" $(PROFILES) .DEFAULT: diff --git a/docs/source/user-guide/installation.md b/docs/source/user-guide/installation.md index 03ecc53ed..7335a488c 100644 --- a/docs/source/user-guide/installation.md +++ b/docs/source/user-guide/installation.md @@ -57,6 +57,12 @@ Note that the project builds for Scala 2.12 by default but can be built for Scal make release PROFILES="-Pspark-3.4 -Pscala-2.13" ``` +To build Comet from the source distribution on an isolated environment without an access to `github.com` it is necessary to disable `git-commit-id-maven-plugin`, otherwise you will face errors that there is no access to the git during the build process. In that case you may use: + +```console +make release-nogit PROFILES="-Pspark-3.4" +``` + ## Run Spark Shell with Comet enabled Make sure `SPARK_HOME` points to the same Spark version as Comet was built for. diff --git a/docs/source/user-guide/tuning.md b/docs/source/user-guide/tuning.md index 5a3100bd0..f46ab9e0e 100644 --- a/docs/source/user-guide/tuning.md +++ b/docs/source/user-guide/tuning.md @@ -39,6 +39,8 @@ It must be set before the Spark context is created. You can enable or disable Co at runtime by setting `spark.comet.exec.shuffle.enabled` to `true` or `false`. Once it is disabled, Comet will fallback to the default Spark shuffle manager. +> **_NOTE:_** At the moment Comet Shuffle is not compatible with Spark AQE partition coalesce. To disable set `spark.sql.adaptive.coalescePartitions.enabled` to `false`. + ### Shuffle Mode Comet provides three shuffle modes: Columnar Shuffle, Native Shuffle and Auto Mode.