Skip to content

Commit

Permalink
address comments
Browse files Browse the repository at this point in the history
  • Loading branch information
marin-ma committed Nov 14, 2024
1 parent fed5c1f commit 4a610a7
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 25 deletions.
2 changes: 1 addition & 1 deletion tools/workload/benchmark_velox/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ papermill tpc_workload.ipynb --inject-output-path -f params.yaml gluten_tpch.ipy
```
After execution, the output notebook will be saved as `gluten_tpch.ipynb`.

If you want to use different parameters, you can specify them via the `-f` option. It will overwrite the previously defined parameters in `params.yaml`. e.g. To switch to the TPC-DS workload, run:
If you want to use different parameters, you can specify them via the `-p` option. It will overwrite the previously defined parameters in `params.yaml`. e.g. To switch to the TPC-DS workload, run:

```bash
papermill tpc_workload.ipynb --inject-output-path -f params.yaml -p workoad tpcds gluten_tpcds.ipynb
Expand Down
11 changes: 9 additions & 2 deletions tools/workload/benchmark_velox/init_disks.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# To set up the virtual environment required to run this script,
# refer to the `Format and mount disks` subsection under `System Setup` in initialize.ipynb.
import sys
import subprocess
import questionary
Expand All @@ -26,17 +28,22 @@ def yes_or_no(question):
elif user_input.lower() == 'no':
return False
elif user_input.lower() == 'quit':
sys.exit(0)
sys.exit(1)
else:
continue

def filter_empty_str(l):
return [x for x in l if x]

def run_and_log(cmd):
print('\033[92m' + '>>> Running command: ' + repr(cmd) + '\033[0m')
# Print command in yellow
print('\033[93m' + '>>> Running command: ' + repr(cmd) + '\033[0m')
result = subprocess.run(cmd, check=True, shell=True, capture_output=True, text=True)
# Print stdout in green
print('\033[92m' + '==========stdout==========' + '\033[0m')
print(result.stdout)
# Print stderr in red
print('\033[91m' + '==========stderr==========' + '\033[0m')
print(result.stderr)

def init_disks():
Expand Down
38 changes: 16 additions & 22 deletions tools/workload/benchmark_velox/initialize.ipynb
Original file line number Diff line number Diff line change
@@ -1,21 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BKM:\n",
"*****\n",
"1. Config slaves as \"cluster palcement group\", to get best and stable throughput among nodes.\n",
"2. Open port accesses internally, don't expose port access to public\n",
"3. Configure security.authorization and ip control through hadoop\n",
"4. Network throughput depends on instance type. If master node is small instance, avoid to copy large file from master to slaves or HDFS, using one of slavers\n",
"5. If you want to cache the files to memory, set replication to 1. Otherwise you can't make sure Yarn schedule the same task to the same node always.\n",
"6. If you copy large file from one slave to whole HDFS, the distribution is biased. The slave holds most of the data in its HDFS. Solution is to copy the file from HDFS to HDFS again\n",
"7. vcpu isn't map to physical CPU with the same index, so you can't make sure two vcpu doesn't share the same physical core. So pin executors to cores through the native OS policy leads to poor performance. We need to make clear how vcpu share the same core firstly\n",
"*****"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -394,7 +378,7 @@
},
"source": [
"# Initialize\n",
"<font color=red size=3> Run this section after note book restart! </font>"
"<font color=red size=3> Run this section after notebook restart! </font>"
]
},
{
Expand Down Expand Up @@ -2773,14 +2757,18 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"# Install Trace-Viewer (optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Clone the master branch\n",
"```\n",
Expand All @@ -2791,7 +2779,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Trace-Viewer requires python version 2.7. Create a virtualenv for python2.7\n",
"```\n",
Expand All @@ -2803,7 +2793,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Apply patch\n",
"\n",
Expand Down Expand Up @@ -2836,7 +2828,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Start the service\n",
"\n",
Expand Down

0 comments on commit 4a610a7

Please sign in to comment.