Overflow turns the whole batch to `NaN`s #87

nmheim · 2024-06-28T12:49:51Z

Hey! Thanks a lot for this, I really like the package!:)

I seems like an overflow in one of the samples causes the whole batch to be turned into NaNs:

using DynamicExpressions

T = Float64
x = Node{T}(feature=1)
ops = OperatorEnum(binary_operators=[*])
expr = x*2

julia> X = ones(1,2)
julia> expr(X, ops)
2-element Vector{Float64}:
 2.0
 2.0

julia> X[2] = floatmax(T)
julia> expr(X, ops)
2-element Vector{Float64}:
 NaN
 NaN

The text was updated successfully, but these errors were encountered:

MilesCranmer · 2024-06-28T13:21:26Z

Thanks, glad you are enjoying it!

Yes, this is expected. The evaluation returns early whenever any element of any step in the evaluation is NaN or Inf:

DynamicExpressions.jl/src/Evaluate.jl

Line 178 in 9e95f05

@return_on_nonfinite_array result_r.x

DynamicExpressions.jl/src/Evaluate.jl

Lines 22 to 28 in 9e95f05

    
           macro return_on_nonfinite_array(array) 
        
               :( 
        
                   if is_bad_array($(esc(array))) 
        
                       return $(ResultOk)($(esc(array)), false) 
        
                   end 
        
               ) 
        
           end

DynamicExpressions.jl/src/Utils.jl

Lines 15 to 17 in 9e95f05

    
           # Fastest way to check for NaN in an array. 
        
           # (due to optimizations in sum()) 
        
           is_bad_array(array) = !(isempty(array) || isfinite(sum(array)))

. Thus, because the evaluation returned early, the buffer is not actually equal to the result of the expression. So the buffer is overwritten with NaNs to make this obvious.

This behavior is done with symbolic regression in mind, so that expressions with singularities don't waste more cycles than they have to. Within SymbolicRegression, the eval_tree_array call is used directly, so it can skip the step where NaNs are written to the buffer:

result, completed = eval_tree_array(tree, X, operators)
if !completed
    # evaluation quit early, so return infinite loss value
end

nmheim · 2024-06-28T13:29:35Z

Thanks for the quick reply! that makes sense. Is there a way to get NaNs only for the samples that actually fail other than looping over the batch? In any case, as this is expected behaviour I think this can be closed:)

MilesCranmer · 2024-06-28T13:59:56Z

At the moment, no, but it might be nice to have that behavior. If you are interested and have some time I can help point out what needs to be edited for this to work?

nmheim · 2024-06-28T14:29:57Z

That sounds great, I'd be happy to take a look at it if you give me some pointers!:)

MilesCranmer · 2024-06-28T15:32:04Z

I think eval_tree_array:

DynamicExpressions.jl/src/Evaluate.jl

Lines 66 to 72 in 9e95f05

    
           function eval_tree_array( 
        
               tree::AbstractExpressionNode{T}, 
        
               cX::AbstractMatrix{T}, 
        
               operators::OperatorEnum; 
        
               turbo::Union{Bool,Val}=Val(false), 
        
               bumper::Union{Bool,Val}=Val(false), 
        
           ) where {T<:Number}

needs to have a new parameter early_exit::Union{Bool,Val}=Val(true) similar to turbo. Then, @return_on_nonfinite_array

DynamicExpressions.jl/src/Evaluate.jl

Lines 22 to 28 in 9e95f05

    
           macro return_on_nonfinite_array(array) 
        
               :( 
        
                   if is_bad_array($(esc(array))) 
        
                       return $(ResultOk)($(esc(array)), false) 
        
                   end 
        
               ) 
        
           end

needs to have another argument early_exit which basically toggles the behavior on/off. Similar for @return_on_check. Then, other checks like

DynamicExpressions.jl/src/Evaluate.jl

Line 87 in 9e95f05

return (result.x, result.ok && !is_bad_array(result.x))

need to be handled manually.

In other words, if early_exit is true, do the isfinite check. Otherwise, skip it. The early_exit should come first in any if statement so that compute is not wasted.

Note that most of the ::Val{turbo} in this file is just given as ::Val{false}. That is because Val{true} is implemented in a separate file:

DynamicExpressions.jl/ext/DynamicExpressionsLoopVectorizationExt.jl

Lines 20 to 28 in 9e95f05

    
           function deg2_eval( 
        
               cumulator_l::AbstractVector{T}, cumulator_r::AbstractVector{T}, op::F, ::Val{true} 
        
           )::ResultOk where {T<:Number,F} 
        
               @turbo for j in eachindex(cumulator_l) 
        
                   x = op(cumulator_l[j], cumulator_r[j]) 
        
                   cumulator_l[j] = x 
        
               end 
        
               return ResultOk(cumulator_l, true) 
        
           end

. (These functions are all the same except they use LoopVectorization.@turbo on the loops.)

For early_exit, everything can be implemented in the same file, since it will just turn on/off the isfinite checks.

With this implemented you would be able to call expr(X, ops; early_exit=Val(false)) which would let you avoid filling the NaNs in the array.

Sorry this was a bit convoluted – ask any questions needed!

nmheim mentioned this issue Jul 1, 2024

Add parameter to disable early exit of expression evaluation #91

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overflow turns the whole batch to `NaN`s #87

Overflow turns the whole batch to `NaN`s #87

nmheim commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

nmheim commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

nmheim commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 •

edited

Loading

Overflow turns the whole batch to NaNs #87

Overflow turns the whole batch to NaNs #87

Comments

nmheim commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

nmheim commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

nmheim commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 • edited Loading

Overflow turns the whole batch to `NaN`s #87

Overflow turns the whole batch to `NaN`s #87

MilesCranmer commented Jun 28, 2024 •

edited

Loading