Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLC betweenness_centrality appears to hang on several datasets #3824

Closed
rlratzel opened this issue Aug 25, 2023 · 1 comment
Closed

PLC betweenness_centrality appears to hang on several datasets #3824

rlratzel opened this issue Aug 25, 2023 · 1 comment
Assignees
Labels
bug Something isn't working CRITICAL BUG! BUG that needs to be FIX NOW !!!!

Comments

@rlratzel
Copy link
Contributor

I don't know if there's a pattern here or if the data was invalid for this call, but for all "medium sized" datasets (>20k edges) I tested, pylibcugraph.betweenness_centrality() appears to hang. A reproducer script is attached, which when run had not completed after ~6 hours.

(rapids) root@b0479c536462:/demo# python -i plc_bc_demo.py
reading cit-Patents (directed=True)...done in 3.7644715309143066
calling SGGraph()...done in 0.028629302978515625
calling plc.bc...

The GPU appears to be busy while running the entire time:
image

See the attached script for more details.
Smaller datasets such as karate, netscience, and email_Eu_core all run to completion after a few seconds end-to-end.
The cugraph version of the same script (ie. uses the corresponding cugraph APIs instead of PLC) shows the same behavior. The cugraph version of the script can be provided if helpful.

The script uses datasets provided by cugraph.datasets and also externally available ones. The directions to use the external datasets are in the script and below:

# These can be downloaded and used here by running
#   cd <cugraph repo>/datasets
#   ./get_test_data.sh --benchmark
# then setting the env var
#   RAPIDS_DATASET_ROOT_DIR to <cugraph repo>/datasets

plc_bc_demo.py

cc @ChuckHastings , @eriknw

@rlratzel rlratzel added bug Something isn't working CRITICAL BUG! BUG that needs to be FIX NOW !!!! labels Aug 25, 2023
@rlratzel rlratzel self-assigned this Aug 25, 2023
@rlratzel
Copy link
Contributor Author

rlratzel commented Aug 25, 2023

This is very likely a false alarm.

The problem here is that k is left unset resulting in the computation of BC based on all paths for all nodes in the graph, instead of limiting to k nodes. This means k BFS computations for all nodes, for each BC value (ie. BFS for all nodes * all nodes), which for large graphs can be prohibitively expensive, as seen here.

Using the above script, adding k=1000 for cit-Patents results in BC returning in ~12 seconds.

I will close this and possibly re-open if further testing exposes an actual hang.

cc @BradReesWork @ogreen

@rlratzel rlratzel closed this as not planned Won't fix, can't repro, duplicate, stale Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CRITICAL BUG! BUG that needs to be FIX NOW !!!!
Projects
None yet
Development

No branches or pull requests

1 participant