Sampling Performance Testing #3584

alexbarghi-nv · 2023-05-19T03:01:00Z

Adds performance benchmarking scripts for testing MNMG cuGraph GNN workflows.
This branch is the head branch for the cuGraph benchmarking effort. All work supporting the benchmarks should be merged into this branch. It will be merged into branch-24.02 once all features are ready.

Includes patches to cuGraph-PyG required for the latest DLFW container.

To-Do:

Refactor for branch-24.02
~~Add WholeGraph training portion~~ Deferred to future PR (see Add WholeGraph Support alexbarghi-nv/cugraph#6)
~~Add WholeGraph generators~~ Included in above
~~Support DGL~~ Deferred to future PR
~~Use appropriate docker containers~~ Deferred, waiting on DLFW release

Closes #3839

benchmarks/cugraph/standalone/cugraph_bulk_sampling.py

benchmarks/cugraph-pyg/cugraph_pyg_graph_sage.py

review-notebook-app · 2023-07-11T17:57:34Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…erf-testing-v2

…graph into perf-testing-v2

…to perf-testing-v2

alexbarghi-nv · 2024-01-05T19:18:31Z

/ok to test

alexbarghi-nv · 2024-01-05T19:18:44Z

/ok to test

alexbarghi-nv · 2024-01-08T18:50:51Z

/ok to test

alexbarghi-nv · 2024-01-08T18:57:18Z

/ok to test

VibhuJawa

Requested minor changes, mostly looks good.

VibhuJawa · 2024-01-09T21:17:20Z

benchmarks/cugraph/standalone/bulk_sampling/README.md

+the number of training epochs here.  These are followed by the `REPLICATION_FACTOR` argument, which
+can be used to create replications of the dataset for scale testing purposes.
+
+The final two arguments are `FRAMEWORK` which can be either "cuGraphPyG" or "PyG", and `GPUS_PER_NODE`


I assume we shall include "cuGraphDGL" here too.

in the next PR

benchmarks/cugraph/standalone/bulk_sampling/bench_cugraph_training.py

VibhuJawa · 2024-01-09T21:21:34Z

benchmarks/cugraph/standalone/bulk_sampling/run_sampling.sh

+SCRIPTS_DIR=$4
+NUM_EPOCHS=$5
+
+SAMPLES_DIR=/samples


Assuming we are mounting this to the most performant path.

Yes, it is up to us to set LOGS_DIR, SAMPLES_DIR, and DATASETS_DIR in run_train_job.sh correctly. In the srun command, those are mounted to /logs, /samples, and /datasets in the container that this script runs in.

mg_utils/wait_for_workers.py

VibhuJawa

LGTM

alexbarghi-nv · 2024-01-10T15:38:49Z

/ok to test

alexbarghi-nv · 2024-01-11T20:27:18Z

/ok to test

rlratzel

LGTM overall. I don't feel too strongly, but I noticed several places in the shell scripts that assume things about the file system (/datasets, etc.). We usually put those scripts in another repo (the repo containing our machine-specific nightly scripts, etc.) and not the open cugraph repo.

alexbarghi-nv · 2024-01-11T21:15:18Z

LGTM overall. I don't feel too strongly, but I noticed several places in the shell scripts that assume things about the file system (/datasets, etc.). We usually put those scripts in another repo (the repo containing our machine-specific nightly scripts, etc.) and not the open cugraph repo.

The /datasets and /scripts directories are mounted to directories the user has to provide. They default to the current working directory (it creates folders there and mounts them). So we're not exposing any NVIDIA internal info.

ChuckHastings

Not sure why we need to update the copyright on a file that wasn't otherwise updated... but OK.

alexbarghi-nv · 2024-01-11T22:58:01Z

/merge

…formance Improvements (#4081) Large-scale cuGraph-DGL performance testing scripts. Also changes the DGL and PyG scripts to evaluate on all ranks and reuse the test samples, and adds support for benchmarking cuGraph-DGL/cuGraph-PyG with WholeGraph. Updates `cuGraph.gnn.FeatureStore` and `cuGraph-PyG` for increased performance: * Supporting passing in a WG embedding directly to cugraph.gnn.FeatureStore * Simplifying how cuGraph-PyG handles filtering and using a cache to prevent repeatedly copying data between the device and host * Fix bug in cugraph.gnn.FeatureStore where indexing with a gpu tensor would raise an exception, especially with WG * Add a function to cugraph.gnn.FeatureStore to check where data is stored, which is used by cuGraph-PyG to prevent unnecessary d2h and h2d copies Merge after #3584 Authors: - Alex Barghi (https://github.com/alexbarghi-nv) - Seunghwa Kang (https://github.com/seunghwak) - Vibhu Jawa (https://github.com/VibhuJawa) - Brad Rees (https://github.com/BradReesWork) Approvers: - Vibhu Jawa (https://github.com/VibhuJawa) - Don Acosta (https://github.com/acostadon) - Brad Rees (https://github.com/BradReesWork) - Naim (https://github.com/naimnv) - Joseph Nke (https://github.com/jnke2016) URL: #4081

alexbarghi-nv added feature request New feature or request Blocked Cannot progress due to external reasons non-breaking Non-breaking change labels May 19, 2023

alexbarghi-nv self-assigned this May 19, 2023

alexbarghi-nv added this to the 23.08 milestone May 19, 2023

BradReesWork changed the base branch from branch-23.06 to branch-23.08 May 30, 2023 12:55

VibhuJawa reviewed May 30, 2023

View reviewed changes

benchmarks/cugraph/standalone/cugraph_bulk_sampling.py Outdated Show resolved Hide resolved

VibhuJawa reviewed Jun 1, 2023

View reviewed changes

benchmarks/cugraph/standalone/cugraph_bulk_sampling.py Outdated Show resolved Hide resolved

benchmarks/cugraph/standalone/cugraph_bulk_sampling.py Outdated Show resolved Hide resolved

VibhuJawa reviewed Jun 14, 2023

View reviewed changes

benchmarks/cugraph-pyg/cugraph_pyg_graph_sage.py Outdated Show resolved Hide resolved

VibhuJawa reviewed Jun 14, 2023

View reviewed changes

benchmarks/cugraph-pyg/cugraph_pyg_graph_sage.py Outdated Show resolved Hide resolved

BradReesWork modified the milestones: 23.08, 23.10 Jul 25, 2023

seunghwak and others added 17 commits August 2, 2023 23:22

bug fix

c09bb25

Merge branch 'branch-23.08' of github.com:rapidsai/cugraph into bug_mfg

4edb9ae

Merge branch 'bug_mfg' of https://github.com/seunghwak/cugraph into p…

57fb8e5

…erf-testing-v2

add latest updates

3b95106

Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…

3269a4f

…graph into perf-testing-v2

bug fix (when edge list is empty)

3e009cd

Merge branch 'branch-23.08' of https://github.com/rapidsai/cugraph in…

622a17a

…to perf-testing-v2

add latest updates

e4d7796

revert cpp changes

a226a4e

revert plc changes

5d3843f

revert notebook changes

36464a9

Revert logging change

c5a81c2

correction for dataset name

95a72ab

fix for empty batch issue

aebe742

do merge

449984d

bring in changes

bdaa22f

remove redundant filter function

223dee3

alexbarghi-nv requested a review from a team as a code owner January 5, 2024 19:16

github-actions bot added the cuGraph label Jan 5, 2024

style

ea46748

Merge branch 'branch-24.02' into perf-testing-v2

61f30a2

alexbarghi-nv marked this pull request as draft January 5, 2024 19:26

alexbarghi-nv added 2 commits January 8, 2024 06:15

fixes to scripts

89ac530

compatibility issues

77b0788

alexbarghi-nv marked this pull request as ready for review January 8, 2024 18:09

alexbarghi-nv mentioned this pull request Jan 8, 2024

cuGraph-DGL and WholeGraph Performance Testing with Feature Store Performance Improvements #4081

Merged

alexbarghi-nv added 4 commits January 8, 2024 10:51

reset file

4e2a706

c

18e43de

copyright

c4c45db

whitespace

8ea5c92

VibhuJawa suggested changes Jan 9, 2024

View reviewed changes

alexbarghi-nv and others added 2 commits January 9, 2024 13:48

set nthreads to 8

441810c

Merge branch 'branch-24.02' into perf-testing-v2

c053ed0

VibhuJawa approved these changes Jan 10, 2024

View reviewed changes

Merge branch 'branch-24.02' into perf-testing-v2

3039843

rlratzel approved these changes Jan 11, 2024

View reviewed changes

ChuckHastings approved these changes Jan 11, 2024

View reviewed changes

rapids-bot bot merged commit c09db10 into rapidsai:branch-24.02 Jan 12, 2024
97 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling Performance Testing #3584

Sampling Performance Testing #3584

alexbarghi-nv commented May 19, 2023 •

edited

Loading

review-notebook-app bot commented Jul 11, 2023

alexbarghi-nv commented Jan 5, 2024

alexbarghi-nv commented Jan 5, 2024

alexbarghi-nv commented Jan 8, 2024

alexbarghi-nv commented Jan 8, 2024

VibhuJawa left a comment

VibhuJawa Jan 9, 2024

alexbarghi-nv Jan 9, 2024

alexbarghi-nv Jan 9, 2024

VibhuJawa Jan 9, 2024

alexbarghi-nv Jan 9, 2024

VibhuJawa left a comment

alexbarghi-nv commented Jan 10, 2024

alexbarghi-nv commented Jan 11, 2024

rlratzel left a comment

alexbarghi-nv commented Jan 11, 2024

ChuckHastings left a comment

alexbarghi-nv commented Jan 11, 2024

Sampling Performance Testing #3584

Sampling Performance Testing #3584

Conversation

alexbarghi-nv commented May 19, 2023 • edited Loading

review-notebook-app bot commented Jul 11, 2023

alexbarghi-nv commented Jan 5, 2024

alexbarghi-nv commented Jan 5, 2024

alexbarghi-nv commented Jan 8, 2024

alexbarghi-nv commented Jan 8, 2024

VibhuJawa left a comment

Choose a reason for hiding this comment

VibhuJawa Jan 9, 2024

Choose a reason for hiding this comment

alexbarghi-nv Jan 9, 2024

Choose a reason for hiding this comment

alexbarghi-nv Jan 9, 2024

Choose a reason for hiding this comment

VibhuJawa Jan 9, 2024

Choose a reason for hiding this comment

alexbarghi-nv Jan 9, 2024

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

alexbarghi-nv commented Jan 10, 2024

alexbarghi-nv commented Jan 11, 2024

rlratzel left a comment

Choose a reason for hiding this comment

alexbarghi-nv commented Jan 11, 2024

ChuckHastings left a comment

Choose a reason for hiding this comment

alexbarghi-nv commented Jan 11, 2024

alexbarghi-nv commented May 19, 2023 •

edited

Loading