On this page you'll find high-level details of how we have implemented the engine we use in Kedro-Viz. We intend this to be useful as a reference for our project maintainers.
You can see and interact with the engine's graph drawings on the Kedro-Viz demo or explore our code to see how it works in practice, which we describe below in more detail.
We break down graph drawings, like those shown above, into what we consider essential and unique principles:
- The nodes tend to be close together
- The nodes do not overlap
- The nodes align on each axis
- The edges tend to the vertical
- The edges do not cross
- The edges point in the same direction
- The trees have symmetry
We form our constraints between nodes and edges using these rules, which allows us to frame the graph drawing problem as a constrained optimisation problem:
Given the graph G contains set of nodes N = { A, B, C… } with position Ap, Bp, Cp …, a set of edges E = { A → B, A → C… } and a set of constraints C
- For every node pair { A, B } in all nodes N
- Minimise position distance |Ap - Bp|
- Subject to all constraints in C
- For every edge A → B
- Draw path from Ap to Bp avoiding all nodes N
In practice we've made two key simplifcations:
- We consider node position variables in X and Y axes as independent, such that a node may change position freely in X without affecting its position in Y and vice versa
- We consider the routing problem independent from the layout problem
From each rule we form geometric constraints as relations between our node positions:
"Edges point in the same direction" and "Nodes do not overlap"
- For every edge A → B in all edges E
- Ensure minimum distance and direction A to B in Y-axis
By ≥ Ay + d- Where A is the source node and B is the target node
- Where d is the minimum vertical row distance required
Notes
- We implemented this as our row constraint (code)
and our layer constraint by adding intermediate layer nodes (code) - The expected performance is O(E) linear complexity in number of edges
- This equation forms a linear constraint with one minimal solution
"Nodes do not overlap"
- For every pair of nodes { A, B } in all nodes N
- Ensure minimum distance between A and B in X-axis
|Ax - Bx| ≥ d- Where d is the minimum horizontal separation distance required
Notes
- We implemented this as our separation constraint (code)
- Its performance is O(N2) quadratic complexity in number of nodes
- This equation forms a nonlinear constraint with two minimal solutions
- This simplifies given a direction, to linear constraint Bx > Ax + sx (code)
- This simplifies once we have rows with direction, to O(N) linear complexity (code)
"Edges tend to the vertical"
- For every edge A → B in all edges E
- Ensure zero distance between A and B in X-axis
Ax - Bx = 0
Notes
- We implemented this as our parallel constraint (code)
- This equation forms a linear constraint with one minimal solution
- Its performance is O(E) linear complexity in number of edges
"Edges do not cross"
- For every pair of edges { A → B, C → D } in all edges E
- Ensure direction A to C is same direction B to D in X-axis
(Ax < Cx and Bx < Dx) or (Ax > Cx and Bx > Dx)
or as vectors (Ax - Cx) ・ (Bx - Dx) < 0
Notes
- We implemented this as our crossing constraint (code)
- This equation forms a nonlinear constraint with two minimal solutions
- Its performance is O(E2) quadratic complexity in number of edges
- This simplifies once we have rows, to O(max|Erow|2) complexity (code)
Our engine combines the properties of two distinct solving approaches: a loose solver and a strict solver.
For certain constraints (code) this algorithm finds a quick but approximate solution using a soft constraint solver:
- It has a simple implementation (code)
- Its performance is suited to hundreds of thousands of constraints
- It supports many kinds of constraint using objective / loss functions (code)
- It supports soft constraints with partial solutions
- It has limited output similarity given similar inputs
- The constraint strength and iterations must be well tuned (code)
We form a reduced set of constraints (code) based on the loose solution and give that to a slower but exact hard constraint solver:
- It has a complex implementation (handled by kiwi.js / Cassowary library) (code)
- Its performance is suited to a few thousand constraints
- It is limited to hard and linear constraints (code)
- It finds an exact complete solution or no solution
- It has strong output similarity given similar inputs
- The constraint priority can be optionally tuned (code)
The combination results in a layout that assigns all nodes positions that strongly respect our constraints, and does so in a short processing time.
Given our engine has solved our constraints and it has found a layout, it then handles routing edges with a separate algorithm (code).
We found that in many cases a single straight line for each edge would cut through many nodes along the way. Our routing moves around these nodes instead so the drawing is clear for us to read.
Our implemented routing algorithm is roughly:
- Given positions decided for all nodes in N
- For every edge A → B in all edges E
- Start at edge source node position (Ax Ay)
- Draw a line to the closest empty point on each row after
- Stop at the row of the edge target node
- Draw a line to the edge target node position (Bx By)
This results in a relatively short path for each edge, where most path segments tend to be vertical, with its processing completed quickly in practice (code).
When we have multiple edges starting from a single node, we separate them visually for better legibility (code) and order by direction to avoid introducing edge crossings (code).
In practice our layout constraints also include additional space for edges to pass through, especially around a graph's most highly connected nodes (code).
Finally, we render the finished routes to the user as smooth bezier curves (code), as we find we can read them easier than straight lines.