Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Equijoins #38

Open
felix-boeseler opened this issue Feb 29, 2024 · 5 comments
Open

Support for Equijoins #38

felix-boeseler opened this issue Feb 29, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@felix-boeseler
Copy link

Hello, thank you for the great project :)

In how far are equijoins exactly supported?

Given that I have the following NetworkX graph:

G = nx.DiGraph()
G.add_node("x")
G.add_node("y")
G.add_node("z")
G.add_edge("x", "y")
G.add_edge("y", "x")
G.add_edge("x", "x")
G.add_edge("z", "x")

When I execute the following query:

MATCH (n)-->(n)
RETURN n

I get the result:

{Token('CNAME', 'n'): ['x', 'y']}

However, if I execute the same query on the equivalent graph in neo4j, I only get the node x as result - which to my understanding of Cypher would be the correct result.

Therefore, to my understanding, equijoins are currently only supported in the project when they are distributed over multiple match clauses and do not recognizes self cycles on nodes. Is that correct?

For instances, the following query correctly recognizes two loops in the graph:

MATCH (n)-->(m)
MATCH (m)-->(n)
RETURN n, m

While neo4j additionally returns n=x and m=x as a result.

Many regards,
Felix

@j6k4m8
Copy link
Member

j6k4m8 commented Feb 29, 2024

Uh oh! Looks like a bug to me. Your intuition is totally right, we should only be returning x in your first example. Thank you for reporting!!

@j6k4m8 j6k4m8 added the bug Something isn't working label Feb 29, 2024
@j6k4m8
Copy link
Member

j6k4m8 commented Mar 5, 2024

@khoale88 sorry to bug you, wondering if you have thoughts on this one — I think the issue is here:

for match in self._matches_iter(my_motif):
# matches can contains zero hop edges from A to B
# there are 2 cases to take care
# (1) there are both A and B in the match. This case is the result of query A -[*0]-> B --> C.
# If A != B break else continue to (2)
# (2) there is only A in the match. This case is the result of query A -[*0]-> B.
# If A is qualified to be B (node attr match), set B = A else break
for a, b in zero_hop_edges:
if b in match and match[b] != match[a]:
break
if not _is_node_attr_match(
b, match[a], self._motif, self._target_graph
):
break
match[b] = match[a]
else: # For/else loop
# Check if match matches where condition and add
if not self._where_condition or self._where_condition(
match, self._target_graph, self._return_edges
):
self_matches.append(match)
self_matche_paths.append(edge_hop_map)
# Check if limit reached
if self._is_limit(len(self_matches)):
complete = True
break
self._matches = self_matches
self._matche_paths = self_matche_paths

On line 470, self._matches is [{'n': 'x'}, {'n': 'y'}] — so the issue is happening I guess at the pathwise where filtering... wondering if you encountered anything like this while testing. I'm still poking at it and I'll report back here as I learn more!

@khoale88
Copy link
Contributor

Right, let me spend some time on this!

@j6k4m8
Copy link
Member

j6k4m8 commented Apr 17, 2024

No pressure, but it would be super appreciated if you have time!! :)

@khoale88
Copy link
Contributor

khoale88 commented May 5, 2024

I think there is an unexpected behavior in grandiso. After debugging, it drills down to grandiso.find_motif_iter yielding both x and y, Here is the test snippet.

def test_equijoins(self):
    host = nx.DiGraph()
    host.add_node("x")
    host.add_node("y")
    host.add_node("z")
    host.add_edge("x", "y")
    host.add_edge("y", "x")
    host.add_edge("x", "x")
    host.add_edge("z", "x")
    
    motif = nx.DiGraph()
    motif.add_edge("n", "n")

    res = find_motifs(motif, host)
    assert res == [{"n": "x"}]

The test fails with result

>       assert res == [{"n": "x"}]
E       AssertionError: assert [{'n': 'x'}, {'n': 'y'}] == [{'n': 'x'}]
E         Left contains one more item: {'n': 'y'}
E         Full diff:
E         - [{'n': 'x'}]
E         + [{'n': 'x'}, {'n': 'y'}]

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants