Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph evaluator caching #99

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

robdancer
Copy link

Graph evaluator caching component from former PR

Graph evaluator caching

The GraphNode struct now has an additional property, the AbstractArray cache, not initialised during object creation.
This enables the result of each node evaluation to be stored and retrieved without the need for a dictionary lookup.

A GraphNode can now also have the constant property set to true during evaluation if it is an operator node with constant operand nodes. In such a case the val property will also be set to the result of the single operation required. This behaviour won't occur in sufficiently optimised expressions, as a subexpression consisting of operators with constant operand nodes could be replaced with a single constant node. This doesn't meaningfully mutate the expression during the evaluation, as the constant property is otherwise meaningless for nodes with degree >0.

In terms of the evaluator itself, there are two implemented versions: one using Base.map, and one using LoopVectorization.vmapnt which is used when the LoopVectorization package is loaded. Some rough tests I did found that, on a graph with no shared subexpressions, the graph evaluator takes 1.4x as long as the corresponding (with or without LoopVectorization) tree evaluator. I might do some more work on reducing that, one method would be to quickly identify whether shared subexpressions exist in the expression and choose a corresponding evaluator. The relative performance of the graph evaluator operating on an expression with shared subexpressions is entirely dependent on the number and size of the shared subexpressions but has a theoretically unbounded performance improvement.

@MilesCranmer
Copy link
Member

Can test locally by typing ]test into the REPL. Also for standardising the formatting you can use https://domluna.github.io/JuliaFormatter.jl/stable/#Quick-Start

root::GraphNode{T},
cX::AbstractMatrix{T},
operators::OperatorEnum,
loopVectorization::Val{true}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
loopVectorization::Val{true}
::EvalOptions{true}

op::UInt8 # (Possibly undefined) If operator, this is the index of the operator in the degree-specific operator enum
children::NTuple{D,Base.RefValue{GraphNode{T,D}}} # Children nodes
visited::Bool # search accounting, initialised to false
cache::AbstractArray{T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this might not be backwards compatible, I wonder if an alternative would be to convert to a val type (T) where instead of val::T, you have val::Tuple{T,Bool,Array{T,1}}? Then you wouldn’t need to modify the GraphNode itself.

@MilesCranmer
Copy link
Member

Also see https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-abstract-container regarding the ::AbstractArray{T} for cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants