Skip to content

Commit

Permalink
Makes copy of input ddf to work around dropped column names (#3776)
Browse files Browse the repository at this point in the history
When creating multiple graphs with the same dask_cudf dataframe, there is a metadata mismatch occurring when one or more partitions are empty. In fact, during the second graph creation with the dask_cudf dataframe that was used/modified earlier, the metadata are not conserved for partitions with empty empty dataframes. This is due to the fact a _reference_ to the input dataframe partly destroyed (modfied) during the first graph creation is reused in the second graph creation.

This PR makes a copy of the input dataframe right after the repartition call to avoid that alteration.

Authors:
   - jnke2016 ([email protected])

Approvers:
   - Vibhu Jawa (https://github.com/VibhuJawa)
   - Alex Barghi (https://github.com/alexbarghi-nv)
   - Rick Ratzel (https://github.com/rlratzel)
  • Loading branch information
jnke2016 authored Aug 14, 2023
1 parent 15f8bba commit 20dca85
Showing 1 changed file with 2 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,8 @@ def __from_edgelist(
workers = _client.scheduler_info()["workers"]
# Repartition to 2 partitions per GPU for memory efficient process
input_ddf = input_ddf.repartition(npartitions=len(workers) * 2)
# FIXME: Make a copy of the input ddf before implicitly altering it.
input_ddf = input_ddf.map_partitions(lambda df: df.copy())
# The dataframe will be symmetrized iff the graph is undirected
# otherwise, the inital dataframe will be returned
if edge_attr is not None:
Expand Down Expand Up @@ -318,7 +320,6 @@ def __from_edgelist(
is_symmetric=not self.properties.directed,
)
ddf = ddf.repartition(npartitions=len(workers) * 2)
ddf = ddf.map_partitions(lambda df: df.copy())
ddf = persist_dask_df_equal_parts_per_worker(ddf, _client)
num_edges = len(ddf)
ddf = get_persisted_df_worker_map(ddf, _client)
Expand Down

0 comments on commit 20dca85

Please sign in to comment.