-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph evaluator caching #99
base: master
Are you sure you want to change the base?
Conversation
See JuliaLang/julia#55076 for details
Can test locally by typing |
root::GraphNode{T}, | ||
cX::AbstractMatrix{T}, | ||
operators::OperatorEnum, | ||
loopVectorization::Val{true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loopVectorization::Val{true} | |
::EvalOptions{true} |
op::UInt8 # (Possibly undefined) If operator, this is the index of the operator in the degree-specific operator enum | ||
children::NTuple{D,Base.RefValue{GraphNode{T,D}}} # Children nodes | ||
visited::Bool # search accounting, initialised to false | ||
cache::AbstractArray{T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this might not be backwards compatible, I wonder if an alternative would be to convert to a val type (T
) where instead of val::T
, you have val::Tuple{T,Bool,Array{T,1}}
? Then you wouldn’t need to modify the GraphNode
itself.
Also see https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-abstract-container regarding the |
Graph evaluator caching component from former PR
Graph evaluator caching
The GraphNode struct now has an additional property, the AbstractArray cache, not initialised during object creation.
This enables the result of each node evaluation to be stored and retrieved without the need for a dictionary lookup.
A GraphNode can now also have the constant property set to true during evaluation if it is an operator node with constant operand nodes. In such a case the val property will also be set to the result of the single operation required. This behaviour won't occur in sufficiently optimised expressions, as a subexpression consisting of operators with constant operand nodes could be replaced with a single constant node. This doesn't meaningfully mutate the expression during the evaluation, as the constant property is otherwise meaningless for nodes with degree >0.
In terms of the evaluator itself, there are two implemented versions: one using Base.map, and one using LoopVectorization.vmapnt which is used when the LoopVectorization package is loaded. Some rough tests I did found that, on a graph with no shared subexpressions, the graph evaluator takes 1.4x as long as the corresponding (with or without LoopVectorization) tree evaluator. I might do some more work on reducing that, one method would be to quickly identify whether shared subexpressions exist in the expression and choose a corresponding evaluator. The relative performance of the graph evaluator operating on an expression with shared subexpressions is entirely dependent on the number and size of the shared subexpressions but has a theoretically unbounded performance improvement.