-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Improve Dijkstra performance #1
Comments
Thanks for thinking along with this :) Do you have an example of how and where this can be configured? Note that the weight is not constant for each ACL, it depends on the start and end node as well, see the code here: https://github.com/fox-it/aclpwn.py/blob/master/aclpwn/database.py#L78 |
Because Neo4j builtin Dijsktra capabilities are disappointing, you would probably have to write a user procedure that implements Dijkstra with relationship name as cost. This issue made others create custom plugins who modified the implementation to their liking. |
Hi, import java.util.Map;
import java.util.stream.Stream;
import org.neo4j.graphdb.*;
import org.neo4j.logging.Log;
import org.neo4j.procedure.*;
import org.neo4j.graphalgo.WeightedPath;
import org.neo4j.graphalgo.GraphAlgoFactory;
import org.neo4j.graphalgo.PathFinder;
import org.neo4j.helpers.collection.Pair;
import static apoc.util.Util.toDouble;
import apoc.path.RelationshipTypeAndDirections;
import apoc.result.WeightedPathResult;
public class Procedure {
@Context
public Log log;
@Procedure(value = "util.findWeightedPath")
@Description("CALL util.findPathWithConfig(startNode, endNode, 'KNOWS|<WORKS_WITH|IS_MANAGER_OF>', {'KNOWS': 5}, defaultValue, numberOfWantedResults) YIELD path," +
" weight")
public Stream<WeightedPathResult> findWeightedPath(
@Name("startNode") Node startNode,
@Name("endNode") Node endNode,
@Name("relationshipTypesAndDirections") String relTypesAndDirs,
@Name("relationshipsWeights") Map<String, Double> relsWeights,
@Name(value = "defaultWeight", defaultValue = "NaN") double defaultWeight,
@Name(value = "numberOfWantedPaths", defaultValue = "1") long numberOfWantedPaths) {
PathFinder<WeightedPath> algo = GraphAlgoFactory.dijkstra(
buildPathExpander(relTypesAndDirs),
(relationship, direction) -> toDouble(relsWeights.getOrDefault(relationship, defaultWeight)),
(int)numberOfWantedPaths
);
return WeightedPathResult.streamWeightedPathResult(startNode, endNode, algo);
}
public static Map<String, Double> relsWeights = new HashMap<String, Double>() {
{
put("KNOWS", 1.0);
put("WORKS_WITH", 10.0);
put("IS_MANAGER_OF", 100.0);
}
};
@Procedure(value = "util.findPath")
@Description("CALL util.findPath(startNode, endNode, numberOfWantedResults) YIELD path, weight")
public Stream<WeightedPathResult> searchPath(@Name("startNode") Node startNode,
@Name("endNode") Node endNode,
@Name(value = "numberOfWantedPaths", defaultValue = "1") long numberOfWantedPaths) {
return this.findPathWithConfig(startNode, endNode, '>', relsWeights, 1, numberOfWantedPaths);
}
private static PathExpander<Object> buildPathExpander(String relationshipsAndDirections) {
PathExpanderBuilder builder = PathExpanderBuilder.empty();
for (Pair<RelationshipType, Direction> pair : RelationshipTypeAndDirections
.parse(relationshipsAndDirections)) {
if (pair.first() == null) {
if (pair.other() == null) {
builder = PathExpanderBuilder.allTypesAndDirections();
} else {
builder = PathExpanderBuilder.allTypes(pair.other());
}
} else {
if (pair.other() == null) {
builder = builder.add(pair.first());
} else {
builder = builder.add(pair.first(), pair.other());
}
}
}
return builder.build();
}
} |
Hi @dirkjanm,
The current implementation of the Dijkstra algorithm is not optimal, which is noticeable when dealing with larger graphs. As you mentioned in the wiki, the ShortestPath algorithm is faster than Dijkstra but the performance differences can be greatly reduced with few optimizations.
Neo4j stores relationship's properties values on disk which mean a lot of unnecessary IO overhead which dramatically slowing the process of graph traversal. Because the weight is constant for each ACL, it's better to calculate the cost based on the relationship type (name) which is in-memory (assuming the entire graph can be loaded to RAM and cache is warmed up). You can achieve this by tweaking the
neo4j-graph-algorithms
plugin to use constant weight values for each ACL instead of looking at the relationship's property. This will also spare the creation of additional properties.The text was updated successfully, but these errors were encountered: