Skip to content

Commit

Permalink
Merge branch 'branch-23.08' into branch-23.08-resultsset
Browse files Browse the repository at this point in the history
  • Loading branch information
betochimas authored Aug 14, 2023
2 parents 2a92f2f + 15f8bba commit 495964b
Show file tree
Hide file tree
Showing 24 changed files with 570 additions and 134 deletions.
72 changes: 72 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,75 @@
# cuGraph 23.08.00 (9 Aug 2023)

## 🚨 Breaking Changes

- Change the renumber_sampled_edgelist function behavior. ([#3762](https://github.com/rapidsai/cugraph/pull/3762)) [@seunghwak](https://github.com/seunghwak)
- PLC and Python Support for Sample-Side MFG Creation ([#3734](https://github.com/rapidsai/cugraph/pull/3734)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Stop using setup.py in build.sh ([#3704](https://github.com/rapidsai/cugraph/pull/3704)) [@vyasr](https://github.com/vyasr)
- Refactor edge betweenness centrality ([#3672](https://github.com/rapidsai/cugraph/pull/3672)) [@jnke2016](https://github.com/jnke2016)
- [FIX] Fix the hang in cuGraph Python Uniform Neighbor Sample, Add Logging to Bulk Sampler ([#3669](https://github.com/rapidsai/cugraph/pull/3669)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)

## 🐛 Bug Fixes

- Change the renumber_sampled_edgelist function behavior. ([#3762](https://github.com/rapidsai/cugraph/pull/3762)) [@seunghwak](https://github.com/seunghwak)
- Fix bug discovered in Jaccard testing ([#3758](https://github.com/rapidsai/cugraph/pull/3758)) [@ChuckHastings](https://github.com/ChuckHastings)
- fix inconsistent graph properties between the SG and the MG API ([#3757](https://github.com/rapidsai/cugraph/pull/3757)) [@jnke2016](https://github.com/jnke2016)
- Fixes options for `--pydevelop` to remove unneeded CWD path ("."), restores use of `setup.py` temporarily for develop builds ([#3747](https://github.com/rapidsai/cugraph/pull/3747)) [@rlratzel](https://github.com/rlratzel)
- Fix sampling call parameters if compiled with -DNO_CUGRAPH_OPS ([#3729](https://github.com/rapidsai/cugraph/pull/3729)) [@ChuckHastings](https://github.com/ChuckHastings)
- Fix primitive bug discovered in MG edge betweenness centrality testing ([#3723](https://github.com/rapidsai/cugraph/pull/3723)) [@ChuckHastings](https://github.com/ChuckHastings)
- Reorder dependencies.yaml channels ([#3721](https://github.com/rapidsai/cugraph/pull/3721)) [@raydouglass](https://github.com/raydouglass)
- [BUG] Fix namesapce to default_hash and hash_functions ([#3711](https://github.com/rapidsai/cugraph/pull/3711)) [@naimnv](https://github.com/naimnv)
- [BUG] Fix Bulk Sampling Test Issue ([#3701](https://github.com/rapidsai/cugraph/pull/3701)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Make `pylibcugraphops` optional imports in `cugraph-dgl` and `-pyg` ([#3693](https://github.com/rapidsai/cugraph/pull/3693)) [@tingyu66](https://github.com/tingyu66)
- [FIX] Rename `cugraph-ops` symbols (refactoring) and update GHA workflows to call pytest via `python -m pytest` ([#3688](https://github.com/rapidsai/cugraph/pull/3688)) [@naimnv](https://github.com/naimnv)
- [FIX] Fix the hang in cuGraph Python Uniform Neighbor Sample, Add Logging to Bulk Sampler ([#3669](https://github.com/rapidsai/cugraph/pull/3669)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- force atlas notebook changes to run in cugraph 23.08 container. ([#3656](https://github.com/rapidsai/cugraph/pull/3656)) [@acostadon](https://github.com/acostadon)

## 📖 Documentation

- this fixes github links in cugraph, cugraph-dgl and cugraph-pyg ([#3650](https://github.com/rapidsai/cugraph/pull/3650)) [@acostadon](https://github.com/acostadon)
- Fix minor typo in README.md ([#3636](https://github.com/rapidsai/cugraph/pull/3636)) [@akasper](https://github.com/akasper)
- Created landing spot for centrality and similarity algorithms ([#3620](https://github.com/rapidsai/cugraph/pull/3620)) [@acostadon](https://github.com/acostadon)

## 🚀 New Features

- Compute shortest distances between given sets of origins and destinations for large diameter graphs ([#3741](https://github.com/rapidsai/cugraph/pull/3741)) [@seunghwak](https://github.com/seunghwak)
- Update primitive to compute weighted Jaccard, Sorensen and Overlap similarity ([#3728](https://github.com/rapidsai/cugraph/pull/3728)) [@naimnv](https://github.com/naimnv)
- Add CUDA 12.0 conda environment. ([#3725](https://github.com/rapidsai/cugraph/pull/3725)) [@bdice](https://github.com/bdice)
- Renumber utility function for sampling output ([#3707](https://github.com/rapidsai/cugraph/pull/3707)) [@seunghwak](https://github.com/seunghwak)
- Integrate C++ Sampling Source Behavior Updates ([#3699](https://github.com/rapidsai/cugraph/pull/3699)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Adds `fail_on_nonconvergence` option to `pagerank` to provide pagerank results even on non-convergence ([#3639](https://github.com/rapidsai/cugraph/pull/3639)) [@rlratzel](https://github.com/rlratzel)
- Add Benchmark for Bulk Sampling ([#3628](https://github.com/rapidsai/cugraph/pull/3628)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- cugraph: Build CUDA 12 packages ([#3456](https://github.com/rapidsai/cugraph/pull/3456)) [@vyasr](https://github.com/vyasr)

## 🛠️ Improvements

- Pin `dask` and `distributed` for `23.08` release ([#3761](https://github.com/rapidsai/cugraph/pull/3761)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `build.yaml` workflow ([#3756](https://github.com/rapidsai/cugraph/pull/3756)) [@ajschmidt8](https://github.com/ajschmidt8)
- Support MFG creation on sampling gpus for cugraph dgl ([#3742](https://github.com/rapidsai/cugraph/pull/3742)) [@VibhuJawa](https://github.com/VibhuJawa)
- PLC and Python Support for Sample-Side MFG Creation ([#3734](https://github.com/rapidsai/cugraph/pull/3734)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Switch to new wheel building pipeline ([#3731](https://github.com/rapidsai/cugraph/pull/3731)) [@vyasr](https://github.com/vyasr)
- Remove RAFT specialization. ([#3727](https://github.com/rapidsai/cugraph/pull/3727)) [@bdice](https://github.com/bdice)
- C API for renumbering the samples ([#3724](https://github.com/rapidsai/cugraph/pull/3724)) [@ChuckHastings](https://github.com/ChuckHastings)
- Only run cugraph conda CI for CUDA 11. ([#3713](https://github.com/rapidsai/cugraph/pull/3713)) [@bdice](https://github.com/bdice)
- Promote `Datasets` to stable and clean-up unit tests ([#3712](https://github.com/rapidsai/cugraph/pull/3712)) [@nv-rliu](https://github.com/nv-rliu)
- [BUG] Unsupported graph for similiarity algos ([#3710](https://github.com/rapidsai/cugraph/pull/3710)) [@jnke2016](https://github.com/jnke2016)
- Stop using setup.py in build.sh ([#3704](https://github.com/rapidsai/cugraph/pull/3704)) [@vyasr](https://github.com/vyasr)
- [WIP] Make edge ids optional ([#3702](https://github.com/rapidsai/cugraph/pull/3702)) [@VibhuJawa](https://github.com/VibhuJawa)
- Use rapids-cmake testing to run tests in parallel ([#3697](https://github.com/rapidsai/cugraph/pull/3697)) [@robertmaynard](https://github.com/robertmaynard)
- Sampling modifications to support PyG and DGL options ([#3696](https://github.com/rapidsai/cugraph/pull/3696)) [@ChuckHastings](https://github.com/ChuckHastings)
- Include cuCollection public header for hash functions ([#3694](https://github.com/rapidsai/cugraph/pull/3694)) [@seunghwak](https://github.com/seunghwak)
- Refactor edge betweenness centrality ([#3672](https://github.com/rapidsai/cugraph/pull/3672)) [@jnke2016](https://github.com/jnke2016)
- Refactor RMAT ([#3662](https://github.com/rapidsai/cugraph/pull/3662)) [@jnke2016](https://github.com/jnke2016)
- [REVIEW] Optimize bulk sampling ([#3661](https://github.com/rapidsai/cugraph/pull/3661)) [@VibhuJawa](https://github.com/VibhuJawa)
- Update to CMake 3.26.4 ([#3648](https://github.com/rapidsai/cugraph/pull/3648)) [@vyasr](https://github.com/vyasr)
- Optimize cugraph-dgl MFG creation ([#3646](https://github.com/rapidsai/cugraph/pull/3646)) [@VibhuJawa](https://github.com/VibhuJawa)
- use rapids-upload-docs script ([#3640](https://github.com/rapidsai/cugraph/pull/3640)) [@AyodeAwe](https://github.com/AyodeAwe)
- Fix dependency versions for `23.08` ([#3638](https://github.com/rapidsai/cugraph/pull/3638)) [@ajschmidt8](https://github.com/ajschmidt8)
- Unpin `dask` and `distributed` for development ([#3634](https://github.com/rapidsai/cugraph/pull/3634)) [@galipremsagar](https://github.com/galipremsagar)
- Remove documentation build scripts for Jenkins ([#3627](https://github.com/rapidsai/cugraph/pull/3627)) [@ajschmidt8](https://github.com/ajschmidt8)
- Unpin scikit-build upper bound ([#3609](https://github.com/rapidsai/cugraph/pull/3609)) [@vyasr](https://github.com/vyasr)
- Implement C++ Edge Betweenness Centrality ([#3602](https://github.com/rapidsai/cugraph/pull/3602)) [@ChuckHastings](https://github.com/ChuckHastings)

# cuGraph 23.06.00 (7 Jun 2023)

## 🚨 Breaking Changes
Expand Down
7 changes: 5 additions & 2 deletions python/cugraph/cugraph/structure/graph_classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,14 @@ def __init__(self, m_graph=None, directed=False):
if isinstance(m_graph, MultiGraph):
elist = m_graph.view_edge_list()
if m_graph.is_weighted():
weights = "weights"
weights = m_graph.weight_column
else:
weights = None
self.from_cudf_edgelist(
elist, source="src", destination="dst", edge_attr=weights
elist,
source=m_graph.source_columns,
destination=m_graph.destination_columns,
edge_attr=weights,
)
else:
raise TypeError(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ def __init__(self, properties):
self.properties = simpleDistributedGraphImpl.Properties(properties)
self.source_columns = None
self.destination_columns = None
self.weight_column = None
self.vertex_columns = None

def _make_plc_graph(
sID,
Expand Down Expand Up @@ -175,6 +177,7 @@ def __from_edgelist(
"and destination parameters"
)
ddf_columns = s_col + d_col
self.vertex_columns = ddf_columns.copy()
_client = default_client()
workers = _client.scheduler_info()["workers"]
# Repartition to 2 partitions per GPU for memory efficient process
Expand Down Expand Up @@ -214,10 +217,11 @@ def __from_edgelist(
# The symmetrize step may add additional edges with unknown
# ids and types for an undirected graph. Therefore, only
# directed graphs may be used with ids and types.
# FIXME: Drop the check in symmetrize.py as it is redundant
if len(edge_attr) == 3:
if not self.properties.directed:
raise ValueError(
"User-provided edge ids and edge "
"User-provided edge ids and/or edge "
"types are not permitted for an "
"undirected graph."
)
Expand Down Expand Up @@ -285,6 +289,7 @@ def __from_edgelist(
self.properties.renumber = renumber
self.source_columns = source
self.destination_columns = destination
self.weight_column = weight

# If renumbering is not enabled, this function will only create
# the edgelist_df and not do any renumbering.
Expand Down Expand Up @@ -316,7 +321,6 @@ def __from_edgelist(
ddf = ddf.map_partitions(lambda df: df.copy())
ddf = persist_dask_df_equal_parts_per_worker(ddf, _client)
num_edges = len(ddf)
self._number_of_edges = num_edges
ddf = get_persisted_df_worker_map(ddf, _client)
delayed_tasks_d = {
w: delayed(simpleDistributedGraphImpl._make_plc_graph)(
Expand Down Expand Up @@ -356,6 +360,8 @@ def renumbered(self):

def view_edge_list(self):
"""
FIXME: Should this also return the edge ids and types?
Display the edge list. Compute it if needed.
NOTE: If the graph is of type Graph() then the displayed undirected
edges are the same as displayed by networkx Graph(), but the direction
Expand Down Expand Up @@ -386,7 +392,59 @@ def view_edge_list(self):
"""
if self.edgelist is None:
raise RuntimeError("Graph has no Edgelist.")
return self.edgelist.edgelist_df

edgelist_df = self.input_df
is_string_dtype = False
is_multi_column = False
wgtCol = simpleDistributedGraphImpl.edgeWeightCol
if not self.properties.directed:
srcCol = self.source_columns
dstCol = self.destination_columns
if self.renumber_map.unrenumbered_id_type == "object":
# FIXME: Use the renumbered vertices instead and then un-renumber.
# This operation can be expensive.
is_string_dtype = True
edgelist_df = self.edgelist.edgelist_df
srcCol = self.renumber_map.renumbered_src_col_name
dstCol = self.renumber_map.renumbered_dst_col_name

if isinstance(srcCol, list):
srcCol = self.renumber_map.renumbered_src_col_name
dstCol = self.renumber_map.renumbered_dst_col_name
edgelist_df = self.edgelist.edgelist_df
# unrenumber before extracting the upper triangular part
if len(self.source_columns) == 1:
edgelist_df = self.renumber_map.unrenumber(edgelist_df, srcCol)
edgelist_df = self.renumber_map.unrenumber(edgelist_df, dstCol)
else:
is_multi_column = True

edgelist_df[srcCol], edgelist_df[dstCol] = edgelist_df[
[srcCol, dstCol]
].min(axis=1), edgelist_df[[srcCol, dstCol]].max(axis=1)

edgelist_df = edgelist_df.groupby(by=[srcCol, dstCol]).sum().reset_index()
if wgtCol in edgelist_df.columns:
# FIXME: This breaks if there are are multi edges as those will
# be dropped during the symmetrization step and the original 'weight'
# will be halved.
edgelist_df[wgtCol] /= 2

if is_string_dtype or is_multi_column:
# unrenumber the vertices
edgelist_df = self.renumber_map.unrenumber(edgelist_df, srcCol)
edgelist_df = self.renumber_map.unrenumber(edgelist_df, dstCol)

if self.properties.renumbered:
edgelist_df = edgelist_df.rename(
columns=self.renumber_map.internal_to_external_col_names
)

# If there is no 'wgt' column, nothing will happen
edgelist_df = edgelist_df.rename(columns={wgtCol: self.weight_column})

self.properties.edge_count = len(edgelist_df)
return edgelist_df

def delete_edge_list(self):
"""
Expand All @@ -405,23 +463,7 @@ def number_of_vertices(self):
Get the number of nodes in the graph.
"""
if self.properties.node_count is None:
if self.edgelist is not None:
if self.renumbered is True:
src_col_name = self.renumber_map.renumbered_src_col_name
dst_col_name = self.renumber_map.renumbered_dst_col_name
# FIXME: from_dask_cudf_edgelist() currently requires
# renumber=True for MG, so this else block will not be
# used. Should this else block be removed and added back when
# the restriction is removed?
else:
src_col_name = "src"
dst_col_name = "dst"

ddf = self.edgelist.edgelist_df[[src_col_name, dst_col_name]]
# ddf = self.edgelist.edgelist_df[["src", "dst"]]
self.properties.node_count = ddf.max().max().compute() + 1
else:
raise RuntimeError("Graph is Empty")
self.properties.node_count = len(self.nodes())
return self.properties.node_count

def number_of_nodes(self):
Expand All @@ -434,10 +476,16 @@ def number_of_edges(self, directed_edges=False):
"""
Get the number of edges in the graph.
"""
if self.edgelist is not None:
return self._number_of_edges
else:
raise RuntimeError("Graph is Empty")

if directed_edges and self.edgelist is not None:
return len(self.edgelist.edgelist_df)

if self.properties.edge_count is None:
if self.edgelist is not None:
self.view_edge_list()
else:
raise RuntimeError("Graph is Empty")
return self.properties.edge_count

def in_degree(self, vertex_subset=None):
"""
Expand Down Expand Up @@ -1021,19 +1069,8 @@ def edges(self):
sources and destinations. It does not return the edge weights.
For viewing edges with weights use view_edge_list()
"""
if self.renumbered is True:
src_col_name = self.renumber_map.renumbered_src_col_name
dst_col_name = self.renumber_map.renumbered_dst_col_name
# FIXME: from_dask_cudf_edgelist() currently requires
# renumber=True for MG, so this else block will not be
# used. Should this else block be removed and added back when
# the restriction is removed?
else:
src_col_name = "src"
dst_col_name = "dst"

# return self.view_edge_list()[["src", "dst"]]
return self.view_edge_list()[[src_col_name, dst_col_name]]
return self.view_edge_list()[self.vertex_columns]

def nodes(self):
"""
Expand All @@ -1045,23 +1082,26 @@ def nodes(self):
a dataframe and do 'renumber_map.unrenumber' or 'G.unrenumber'
"""

if self.renumbered:
# FIXME: This relies on current implementation
# of NumberMap, should not really expose
# this, perhaps add a method to NumberMap
if self.edgelist is not None:
if self.renumbered:
# FIXME: This relies on current implementation
# of NumberMap, should not really expose
# this, perhaps add a method to NumberMap

df = self.renumber_map.implementation.ddf.drop(columns="global_id")
df = self.renumber_map.implementation.ddf.drop(columns="global_id")

if len(df.columns) > 1:
return df
else:
return df[df.columns[0]]
if len(df.columns) > 1:
return df
else:
return df[df.columns[0]]

else:
df = self.input_df
return dask_cudf.concat(
[df[self.source_columns], df[self.destination_columns]]
).drop_duplicates()
else:
df = self.input_df
return dask_cudf.concat(
[df[self.source_columns], df[self.destination_columns]]
).drop_duplicates()
raise RuntimeError("Graph is Empty")

def neighbors(self, n):
if self.edgelist is None:
Expand Down
Loading

0 comments on commit 495964b

Please sign in to comment.