PLC betweenness_centrality appears to hang on several datasets #3824

rlratzel · 2023-08-25T13:44:47Z

I don't know if there's a pattern here or if the data was invalid for this call, but for all "medium sized" datasets (>20k edges) I tested, pylibcugraph.betweenness_centrality() appears to hang. A reproducer script is attached, which when run had not completed after ~6 hours.

(rapids) root@b0479c536462:/demo# python -i plc_bc_demo.py
reading cit-Patents (directed=True)...done in 3.7644715309143066
calling SGGraph()...done in 0.028629302978515625
calling plc.bc...

The GPU appears to be busy while running the entire time:

See the attached script for more details.
Smaller datasets such as karate, netscience, and email_Eu_core all run to completion after a few seconds end-to-end.
The cugraph version of the same script (ie. uses the corresponding cugraph APIs instead of PLC) shows the same behavior. The cugraph version of the script can be provided if helpful.

The script uses datasets provided by cugraph.datasets and also externally available ones. The directions to use the external datasets are in the script and below:

# These can be downloaded and used here by running
#   cd <cugraph repo>/datasets
#   ./get_test_data.sh --benchmark
# then setting the env var
#   RAPIDS_DATASET_ROOT_DIR to <cugraph repo>/datasets

plc_bc_demo.py

cc @ChuckHastings , @eriknw

The text was updated successfully, but these errors were encountered:

rlratzel · 2023-08-25T15:52:38Z

This is very likely a false alarm.

The problem here is that k is left unset resulting in the computation of BC based on all paths for all nodes in the graph, instead of limiting to k nodes. This means k BFS computations for all nodes, for each BC value (ie. BFS for all nodes * all nodes), which for large graphs can be prohibitively expensive, as seen here.

Using the above script, adding k=1000 for cit-Patents results in BC returning in ~12 seconds.

I will close this and possibly re-open if further testing exposes an actual hang.

cc @BradReesWork @ogreen

rlratzel added bug Something isn't working CRITICAL BUG! BUG that needs to be FIX NOW !!!! labels Aug 25, 2023

rlratzel self-assigned this Aug 25, 2023

rlratzel closed this as not planned Won't fix, can't repro, duplicate, stale Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLC betweenness_centrality appears to hang on several datasets #3824

PLC betweenness_centrality appears to hang on several datasets #3824

rlratzel commented Aug 25, 2023

rlratzel commented Aug 25, 2023 •

edited

Loading

PLC betweenness_centrality appears to hang on several datasets #3824

PLC betweenness_centrality appears to hang on several datasets #3824

Comments

rlratzel commented Aug 25, 2023

rlratzel commented Aug 25, 2023 • edited Loading

rlratzel commented Aug 25, 2023 •

edited

Loading