Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow models to use a lightweight sparse structure #3782

Merged
merged 7 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 85 additions & 2 deletions python/cugraph-dgl/cugraph_dgl/nn/conv/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import Optional, Tuple

from cugraph.utilities.utils import import_optional

torch = import_optional("torch")
nn = import_optional("torch.nn")
ops_torch = import_optional("pylibcugraphops.pytorch")


class BaseConv(nn.Module):
class BaseConv(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a user-facing class. It is only used to handle the case where we fall back to full graph variant. In addition, with the recent cugraph-ops refactoring disabling the MFG-variant, we might totally remove this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I am sorry, I left the review at wrong line. I meant SparseGraph class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

r"""An abstract base class for cugraph-ops nn module."""

def __init__(self):
Expand Down Expand Up @@ -48,3 +49,85 @@ def pad_offsets(self, offsets: torch.Tensor, size: int) -> torch.Tensor:
self._cached_offsets_fg[offsets.numel() : size] = offsets[-1]

return self._cached_offsets_fg[:size]


class SparseGraph(object):
r"""A god-class to store different sparse formats needed by cugraph-ops
and facilitate sparse format conversions.

Parameters
----------
size: tuple of int
Size of the adjacency matrix: (num_src_nodes, num_dst_nodes).

src_ids: torch.Tensor
Source indices of the edges.

dst_ids: torch.Tensor, optional
Destination indices of the edges.

csrc_ids: torch.Tensor, optional
Compressed source indices. It is a monotonically increasing array of
size (num_src_nodes + 1,). For the k-th source node, its neighborhood
consists of the destinations between `dst_indices[csrc_indices[k]]` and
`dst_indices[csrc_indices[k+1]]`.
Comment on lines +80 to +84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question regarding when num_dst_nodes > len(cdst_ids)-1 for this case.

Lets look at below case:

cdst_ids (Compressed Destinations Indices):0,2,5,7
src_indices: 1,2,2,3,4,4,5

I believe following will work (please correct me):

num_src_nodes = 6
num_dst_nodes = 3 

And i guess below will fail ((please correct me):

num_src_nodes = 6
num_dst_nodes = 5 # Modified it to a higher value to ensure alignment for output nodes that are missing

Question:
So this will have to handled by ensuring correct creation because we want to handle alignment problem b/w blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be illegal when num_dst_nodes != len(cdst_ids)-1. I will improve the error handling in this case. For example, pyg does lots of assertions to check the size. We should throw proper exceptions.

cdst_ids (Compressed Destinations Indices):0,2,5,7
src_indices: 1,2,2,3,4,4,5

In your example with num_src_nodes = 6, num_dst_nodes = 3, this translates to a COO of
(1,2,2,3,4,4,5)
(0,0,1,1,1,2,2)

With num_src_nodes = 6, num_dst_nodes = 5, the constructor should have failed, unless cdst_ids is augmented (cdst_ids = 0,2,5,7,7,7).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yup, this is what i was expecting. We will just make sure that the changes @seunghwak from cugraph sampling ensures that all the MFGs line up.


cdst_ids: torch.Tensor, optional
Compressed destination indices. It is a monotonically increasing array of
size (num_dst_nodes + 1,). For the k-th destination node, its neighborhood
consists of the sources between `src_indices[cdst_indices[k]]` and
`src_indices[cdst_indices[k+1]]`.

dst_ids_is_sorted: bool
Whether `dst_ids` has been sorted in an ascending order.

Notes
-----
COO-format requires `src_ids` and `dst_ids`.
CSC-format requires `cdst_ids` and `src_ids`.
CSR-format requires `csrc_ids` and `dst_ids`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should force user to provide the format requirement to prevent confusion. Like add a format variable something like input_format which will take values coo, csc and csr.

Then we can raise errors according to the input what the user provided.

Also, i dont like input_format variable name but you get the idea.


For MFGs (sampled graphs), the node ids must have been renumbered.
"""

def __init__(
self,
size: Tuple[int, int],
src_ids: torch.Tensor,
dst_ids: Optional[torch.Tensor] = None,
csrc_ids: Optional[torch.Tensor] = None,
cdst_ids: Optional[torch.Tensor] = None,
dst_ids_is_sorted: bool = False,
):
if dst_ids is None and cdst_ids is None:
raise ValueError("One of 'dst_ids' and 'cdst_ids' must be given.")

if src_ids is not None:
src_ids = src_ids.contiguous()
if dst_ids is not None:
dst_ids = dst_ids.contiguous()
if csrc_ids is not None:
csrc_ids = csrc_ids.contiguous()
if cdst_ids is not None:
cdst_ids = cdst_ids.contiguous()

self._src_ids = src_ids
self._dst_ids = dst_ids
self._csrc_ids = csrc_ids
self._cdst_ids = cdst_ids
self.num_src_nodes, self.num_dst_nodes = size

# Force create CSC format.
if self._cdst_ids is None:
if not dst_ids_is_sorted:
self._dst_ids, self._perm = torch.sort(self._dst_ids)
self._src_ids = self._src_ids[self._perm]
self._cdst_ids = torch._convert_indices_from_coo_to_csr(
self._dst_ids,
self.num_dst_nodes,
out_int32=self._dst_ids.dtype == torch.int32,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should remove dst_ids if we are forcing CSC conversion because that will mean memory overhead of maintaining it always ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are forcing csc conversions for now but in future we may want to expand to other formats right, I think we might want to either have this configurable via a class variable.

We can probably borrow the convention from formats.

We wont follow their default of 'coo' -> 'csr' -> 'csc', but have our own version.

See formats docs .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed via slack, we will provide input_format and output_format to help specify which tensor is needed.

def csc(self) -> Tuple[torch.Tensor, torch.Tensor]:
r"""Return CSC format."""
return (self._cdst_ids, self._src_ids)
60 changes: 34 additions & 26 deletions python/cugraph-dgl/cugraph_dgl/nn/conv/sageconv.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
cugraph-ops"""
# pylint: disable=no-member, arguments-differ, invalid-name, too-many-arguments
from __future__ import annotations
from typing import Optional
from typing import Optional, Union

from cugraph_dgl.nn.conv.base import BaseConv
from cugraph_dgl.nn.conv.base import BaseConv, SparseGraph
from cugraph.utilities.utils import import_optional

dgl = import_optional("dgl")
Expand Down Expand Up @@ -98,50 +98,58 @@ def reset_parameters(self):

def forward(
self,
g: dgl.DGLHeteroGraph,
g: Union[SparseGraph, dgl.DGLHeteroGraph],
feat: torch.Tensor,
max_in_degree: Optional[int] = None,
) -> torch.Tensor:
r"""Forward computation.

Parameters
----------
g : DGLGraph
g : DGLGraph or SparseGraph
The graph.
feat : torch.Tensor
Node features. Shape: :math:`(|V|, D_{in})`.
max_in_degree : int
Maximum in-degree of destination nodes. It is only effective when
:attr:`g` is a :class:`DGLBlock`, i.e., bipartite graph. When
:attr:`g` is generated from a neighbor sampler, the value should be
set to the corresponding :attr:`fanout`. If not given,
:attr:`max_in_degree` will be calculated on-the-fly.
Maximum in-degree of destination nodes. When :attr:`g` is generated
from a neighbor sampler, the value should be set to the corresponding
:attr:`fanout`. This option is used to invoke the MFG-variant of
cugraph-ops kernel.
tingyu66 marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
torch.Tensor
Output node features. Shape: :math:`(|V|, D_{out})`.
"""
offsets, indices, _ = g.adj_tensors("csc")

if g.is_block:
if max_in_degree is None:
max_in_degree = g.in_degrees().max().item()

if max_in_degree < self.MAX_IN_DEGREE_MFG:
_graph = ops_torch.SampledCSC(
offsets, indices, max_in_degree, g.num_src_nodes()
)
else:
offsets_fg = self.pad_offsets(offsets, g.num_src_nodes() + 1)
_graph = ops_torch.StaticCSC(offsets_fg, indices)
if max_in_degree is None:
max_in_degree = -1

if isinstance(g, SparseGraph):
offsets, indices = g.csc()
_graph = ops_torch.CSC(
offsets=offsets,
indices=indices,
num_src_nodes=g.num_src_nodes,
dst_max_in_degree=max_in_degree,
)
num_dst_nodes = g.num_dst_nodes
elif isinstance(g, dgl.DGLHeteroGraph):
offsets, indices, _ = g.adj_tensors("csc")
_graph = ops_torch.CSC(
offsets=offsets,
indices=indices,
num_src_nodes=g.num_src_nodes(),
dst_max_in_degree=max_in_degree,
)
num_dst_nodes = g.num_dst_nodes()
tingyu66 marked this conversation as resolved.
Show resolved Hide resolved
else:
_graph = ops_torch.StaticCSC(offsets, indices)
raise TypeError(
f"The graph has to be either a 'SparseGraph' or "
f"'dgl.DGLHeteroGraph', but got '{type(g)}'."
)

feat = self.feat_drop(feat)
h = ops_torch.operators.agg_concat_n2n(feat, _graph, self.aggr)[
: g.num_dst_nodes()
]
h = ops_torch.operators.agg_concat_n2n(feat, _graph, self.aggr)[:num_dst_nodes]
h = self.linear(h)

return h
25 changes: 17 additions & 8 deletions python/cugraph-dgl/tests/nn/test_sageconv.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,9 @@

import pytest

try:
import cugraph_dgl
except ModuleNotFoundError:
pytest.skip("cugraph_dgl not available", allow_module_level=True)

from cugraph.utilities.utils import import_optional
from cugraph_dgl.nn.conv.base import SparseGraph
from cugraph_dgl.nn import SAGEConv as CuGraphSAGEConv
from .common import create_graph1

torch = import_optional("torch")
Expand All @@ -30,20 +27,29 @@
@pytest.mark.parametrize("idtype_int", [False, True])
@pytest.mark.parametrize("max_in_degree", [None, 8])
@pytest.mark.parametrize("to_block", [False, True])
def test_SAGEConv_equality(bias, idtype_int, max_in_degree, to_block):
@pytest.mark.parametrize("sparse_graph", ["coo", "csc", None])
def test_SAGEConv_equality(bias, idtype_int, max_in_degree, to_block, sparse_graph):
SAGEConv = dgl.nn.SAGEConv
CuGraphSAGEConv = cugraph_dgl.nn.SAGEConv
device = "cuda"

in_feat, out_feat = 5, 2
kwargs = {"aggregator_type": "mean", "bias": bias}
g = create_graph1().to(device)

if idtype_int:
g = g.int()
if to_block:
g = dgl.to_block(g)

size = (g.num_src_nodes(), g.num_dst_nodes())
feat = torch.rand(g.num_src_nodes(), in_feat).to(device)

if sparse_graph == "coo":
sg = SparseGraph(size=size, src_ids=g.edges()[0], dst_ids=g.edges()[1])
elif sparse_graph == "csc":
offsets, indices, _ = g.adj_tensors("csc")
sg = SparseGraph(size=size, src_ids=indices, cdst_ids=offsets)

torch.manual_seed(0)
conv1 = SAGEConv(in_feat, out_feat, **kwargs).to(device)

Expand All @@ -57,7 +63,10 @@ def test_SAGEConv_equality(bias, idtype_int, max_in_degree, to_block):
conv2.linear.bias.data[:] = conv1.fc_self.bias.data

out1 = conv1(g, feat)
out2 = conv2(g, feat, max_in_degree=max_in_degree)
if sparse_graph is not None:
out2 = conv2(sg, feat, max_in_degree=max_in_degree)
else:
out2 = conv2(g, feat, max_in_degree=max_in_degree)
assert torch.allclose(out1, out2, atol=1e-06)

grad_out = torch.rand_like(out1)
Expand Down
Loading