[GLUTEN-7641][VL] Add Gluten benchmark scripts #7642

marin-ma · 2024-10-22T12:33:17Z

The notebooks demonstrate how to setup, build and benchmark Spark/Gluten with Jupyter Notebook

github-actions · 2024-10-22T12:33:37Z

FelixYBW · 2024-10-22T17:35:42Z

tools/notebook/README.md

+- Install system dependencies and set up jupyter notebook
+- Configure Hadoop and Spark
+- Configure kernel parameters
+- Install monitoring tools (e.g., sar, emon)


Let's remove emon

zhztheplayer · 2024-10-23T02:48:38Z

Thank you!

BTW there were a couple of related efforts in our code base (not all of them):

#432
#5278

Should we review them then remove the unnecessary / unmaintained ones? If they are still needed, I think we can create a new directory like examples to centralize them.

FelixYBW · 2024-10-23T04:54:16Z

Why there are 3 TPCDS queries set? Can we consolidate to one?

./tools/gluten-it/common/src/main/resources/tpcds-queries
./gluten-core/src/test/resources/tpcds-queries
./gluten-core/target/scala-2.12/test-classes/tpcds-queries

FelixYBW · 2024-10-23T04:55:06Z

Thank you!

BTW there were a couple of related efforts in our code base (not all of them):

#432 #5278

Should we review them then remove the unnecessary / unmaintained ones? If they are still needed, I think we can create a new directory like examples to centralize them.

We may put it under tools/workload, name it as benchmark_velox since the script only support Velox.

marin-ma · 2024-10-23T05:08:45Z

Why there are 3 TPCDS queries set? Can we consolidate to one?

./tools/gluten-it/common/src/main/resources/tpcds-queries ./gluten-core/src/test/resources/tpcds-queries ./gluten-core/target/scala-2.12/test-classes/tpcds-queries

@FelixYBW ./gluten-core/target/scala-2.12/test-classes/tpcds-queries is generated by maven compile time. It's not in the code base.

./tools/gluten-it/common/src/main/resources/tpcds-queries is the one used by GHA and notebook scripts
./gluten-core/src/test/resources/tpcds-queries Not sure if this one is used by any Gluten UT. I will double check. If not, we can remove it.

marin-ma · 2024-10-24T12:59:29Z

@FelixYBW Opened #7666 for some removals.

backends-velox/src/test/resources/tpch-queries-velox should also be removed. I will open another PR to remove them.

zhztheplayer · 2024-10-30T05:00:27Z

tools/notebook/README.md

@@ -0,0 +1,38 @@
+# Setup, Build and Benchmark Spark/Gluten with Jupyter Notebook


Is the PR a work in progress or ready to merge? As I see contents in tools/notebook and tools/workload are identical.

@zhztheplayer Moved contents in tools/notebook to tools/workload/benchmark_velox

jinchengchenghh · 2024-11-08T06:05:51Z

tools/workload/benchmark_velox/init_disks.py

@@ -0,0 +1,96 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more


Do we require the python version? python3 or python2?

jinchengchenghh · 2024-11-08T06:07:19Z

tools/workload/benchmark_velox/init_disks.py

+def run_and_log(cmd):
+    print('\033[92m' + '>>> Running command: ' + repr(cmd) + '\033[0m')
+    result = subprocess.run(cmd, check=True, shell=True, capture_output=True, text=True)
+    print(result.stdout)


Can we add print("=======stdout============") to indicate it is stdout log, so as stderr

jinchengchenghh · 2024-11-08T06:08:03Z

tools/workload/benchmark_velox/init_disks.py

+    all_disks = filter_empty_str(subprocess.run("lsblk -I 7,8,259 -npd --output NAME".split(' '), capture_output=True, text=True).stdout.split('\n'))
+    if not all_disks:
+        print("No disks found on system. Exit.")
+        sys.exit(0)


sys.exit(1), I assume it is not a normal state.

FelixYBW · 2024-11-14T01:42:31Z

tools/workload/benchmark_velox/README.md

+```
+After execution, the output notebook will be saved as `gluten_tpch.ipynb`.
+
+If you want to use different parameters, you can specify them via the `-f` option. It will overwrite the previously defined parameters in `params.yaml`. e.g. To switch to the TPC-DS workload, run:


specify them via the -p

FelixYBW · 2024-11-14T01:51:11Z

initialize.ipynb. Let's remove the BKM section

FelixYBW · 2024-11-14T02:03:18Z

Looks good. Let's test on cloud once we have a chance.

gluten benchmark scripts

2a7e3da

github-actions bot added TOOLS DOCS labels Oct 22, 2024

FelixYBW reviewed Oct 22, 2024

View reviewed changes

fix

7542404

marin-ma added 5 commits October 29, 2024 10:20

update

a021ff4

update

4a66ee6

fix

9388c3c

fix

dfab905

update

bc628da

zhztheplayer reviewed Oct 30, 2024

View reviewed changes

marin-ma added 4 commits October 30, 2024 05:53

remove duplicate

420235e

update

bc58f7f

add install traceviewer

50d7f12

update

792f4b3

jinchengchenghh reviewed Nov 8, 2024

View reviewed changes

FelixYBW reviewed Nov 14, 2024

View reviewed changes

Merge branch 'main' into benchmark-scripts

fed5c1f

FelixYBW approved these changes Nov 14, 2024

View reviewed changes

address comments

4a610a7

marin-ma merged commit 0b899c0 into apache:main Nov 14, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-7641][VL] Add Gluten benchmark scripts #7642

[GLUTEN-7641][VL] Add Gluten benchmark scripts #7642

marin-ma commented Oct 22, 2024 •

edited

Loading

github-actions bot commented Oct 22, 2024

FelixYBW Oct 22, 2024

zhztheplayer commented Oct 23, 2024

FelixYBW commented Oct 23, 2024

FelixYBW commented Oct 23, 2024

marin-ma commented Oct 23, 2024

marin-ma commented Oct 24, 2024

zhztheplayer Oct 30, 2024

FelixYBW Oct 30, 2024

marin-ma Oct 30, 2024

jinchengchenghh Nov 8, 2024

jinchengchenghh Nov 8, 2024

jinchengchenghh Nov 8, 2024

FelixYBW Nov 14, 2024

FelixYBW commented Nov 14, 2024

FelixYBW commented Nov 14, 2024

		@@ -0,0 +1,38 @@
		# Setup, Build and Benchmark Spark/Gluten with Jupyter Notebook

		@@ -0,0 +1,96 @@
		# Licensed to the Apache Software Foundation (ASF) under one or more

[GLUTEN-7641][VL] Add Gluten benchmark scripts #7642

[GLUTEN-7641][VL] Add Gluten benchmark scripts #7642

Conversation

marin-ma commented Oct 22, 2024 • edited Loading

github-actions bot commented Oct 22, 2024

Choose a reason for hiding this comment

zhztheplayer commented Oct 23, 2024

FelixYBW commented Oct 23, 2024

FelixYBW commented Oct 23, 2024

marin-ma commented Oct 23, 2024

marin-ma commented Oct 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FelixYBW commented Nov 14, 2024

FelixYBW commented Nov 14, 2024

marin-ma commented Oct 22, 2024 •

edited

Loading