You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know if there's a pattern here or if the data was invalid for this call, but for all "medium sized" datasets (>20k edges) I tested, pylibcugraph.betweenness_centrality() appears to hang. A reproducer script is attached, which when run had not completed after ~6 hours.
(rapids) root@b0479c536462:/demo# python -i plc_bc_demo.py
reading cit-Patents (directed=True)...done in 3.7644715309143066
calling SGGraph()...done in 0.028629302978515625
calling plc.bc...
The GPU appears to be busy while running the entire time:
See the attached script for more details.
Smaller datasets such as karate, netscience, and email_Eu_core all run to completion after a few seconds end-to-end.
The cugraph version of the same script (ie. uses the corresponding cugraph APIs instead of PLC) shows the same behavior. The cugraph version of the script can be provided if helpful.
The script uses datasets provided by cugraph.datasets and also externally available ones. The directions to use the external datasets are in the script and below:
# These can be downloaded and used here by running
# cd <cugraph repo>/datasets
# ./get_test_data.sh --benchmark
# then setting the env var
# RAPIDS_DATASET_ROOT_DIR to <cugraph repo>/datasets
The problem here is that k is left unset resulting in the computation of BC based on all paths for all nodes in the graph, instead of limiting to k nodes. This means k BFS computations for all nodes, for each BC value (ie. BFS for all nodes * all nodes), which for large graphs can be prohibitively expensive, as seen here.
Using the above script, adding k=1000 for cit-Patents results in BC returning in ~12 seconds.
I will close this and possibly re-open if further testing exposes an actual hang.
I don't know if there's a pattern here or if the data was invalid for this call, but for all "medium sized" datasets (>20k edges) I tested,
pylibcugraph.betweenness_centrality()
appears to hang. A reproducer script is attached, which when run had not completed after ~6 hours.The GPU appears to be busy while running the entire time:
See the attached script for more details.
Smaller datasets such as
karate
,netscience
, andemail_Eu_core
all run to completion after a few seconds end-to-end.The cugraph version of the same script (ie. uses the corresponding cugraph APIs instead of PLC) shows the same behavior. The cugraph version of the script can be provided if helpful.
The script uses datasets provided by
cugraph.datasets
and also externally available ones. The directions to use the external datasets are in the script and below:plc_bc_demo.py
cc @ChuckHastings , @eriknw
The text was updated successfully, but these errors were encountered: