Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with residuals / cost function when running PEtab / Catalyst on actual data #200

Open
yakdahl opened this issue Aug 20, 2024 · 3 comments

Comments

@yakdahl
Copy link

yakdahl commented Aug 20, 2024

Hi! I'm having an issue with using PEtab with Catalyst, and I think there's something going wrong in the residual calculation. I can set up the problem as in the docs, including with data taken from actual measurement sources. There are no errors being thrown, and if I ask the PEtab problem to compute the simulated values for some reasonable parameter vector, I get a reasonable result. However, when I ask for the residuals or the cost function, I get non-numerical values (-Inf and NaN, respectively). Not sure what is going wrong there - any ideas?

This ties in to another question I have - would it be possible to allow definition of custom loss functions that could be supplied by the user? Or would that break other parts of the PEtab code?

Here's the code, as well as an example file for the experimental measurements.

using Catalyst, PEtab
using OrdinaryDiffEq, CSV, DataFrames

rs = @reaction_network begin
10.0^k1, PdCat + ArBr --> PdCatArBr
10.0^k2, PdCatArBr + NaOTMS --> PdCatArOTMS + NaBr
10.0^k3, PdCatArOTMS + Amine --> PdCatArAmine + HOTMS
10.0^k4, PdCatArAmine --> PdCat + ArAmine
10.0^kd2, PdCatArBr --> PdCatArylated
end;

simulation_conditions = Dict("cond_2" => Dict(:NaOTMS=>0.373, :ArBr=>0.355, :PdCat=>0.003, :Amine=>0.426),
"cond_3" => Dict(:NaOTMS=>0.361, :ArBr=>0.688, :PdCat=>0.003, :Amine=>0.413),
"cond_1" => Dict(:NaOTMS=>0.373, :ArBr=>0.355, :PdCat=>0.003, :Amine=>0.426))

params = [PEtabParameter(p, lb=1e-6, ub=1e8, scale=:log10) for p in rs.ps]
@parameters sigma
_sigma = PEtabParameter(:sigma)

push!(params,_sigma)

@unpack ArAmine = rs

obs_ArAmine = PEtabObservable(ArAmine, sigma * ArAmine,transformation=:lin)

observables = Dict("obs_ArAmine" => obs_ArAmine)

measurements = CSV.read("example.csv", DataFrame)

measurements[!,"measurement"] = [maximum([x,1.0e-10]) for x in measurements[!,"measurement"]] # tried to see if forcing the measurements to be at least small positive numbers would fix things - it didn't
measurements[!,"time"] = [x for x in measurements[!,"time"]] # random workaround to make sure the columns are formatted correctly - not sure why they don't read right.
measurements[!,"simulation_id"] = [x for x in measurements[!,"simulation_id"]]
measurements[!,"obs_id"] = [x for x in measurements[!,"obs_id"]]

petab_model = PEtabModel(rs, simulation_conditions, observables, measurements, params, verbose=false)

petab_problem = PEtabODEProblem(petab_model, verbose=false)

sim_vals = petab_problem.compute_simulated_values([1.0 for i in eachindex(params)]) # returns a reasonable vector of floats
petab_problem.compute_residuals([1.0 for i in eachindex(params)]) # returns -Inf

petab_problem.compute_cost([1.0 for i in eachindex(params)]) # returns NaN

example.csv

@sebapersson
Copy link
Owner

Thanks for reporting! You are correct that something is wrong with the residuals, but technically the code is not incorrect :) What causes problems is your noise-formula,

obs_ArAmine = PEtabObservable(ArAmine, sigma * ArAmine,transformation=:lin)

If either sigma or ArArmine is zero, the noise formula is evaluated to 0, and since when computing the likelihood we divide with the nose-formula:

residuals^2 / noise_formula

you are dividing by 0 and therefore get the NaN in compute_cost. How to solve this? Often a combined noise formula is used on the form:

noise_formula = sigma1 + sigma2*ArAmine

Here you would never be able to divide by zero, and there is only one more parameter to estimate (and typically the dataset holds enough information to estimate noise parameters so you should not have any identifiabillity issues).

Your code also errors because your parameter vector x = [1.5 for i in eachindex(params)] is on the log10 scale, in particular

# sigma = x[end]
x[end] == 1.0 # log10(sigma) = 1 thus sigma = 0

You defined sigma as zero, which again makes the measurement noise formula to evaluate to zero. To generate a startguess within the bounds, you can do:

x0 = PEtab.generate_startguesses(petab_problem, 1)

As for a custom objective function, what kind of function do you have in mind? Currently, it is often not possible to use a custom objective, because some formulas for the gradient computations are hard-coded (but there might be workarounds). I am further currently updating PEtab.jl, and one of my aims is to make a custom objective possible.

@yakdahl
Copy link
Author

yakdahl commented Aug 23, 2024

Thanks! That makes sense - hadn't quite realized how the variable transformation was working in the rest of the code. This fixes the problems in the minimal problem, there's still something else wrong in my actual problem. Let me see if I can figure out what that is and I'll get back to you on that.

In terms of loss functions: I've previously added time derivatives or similar terms (such as the slope of a line fit to the first n datapoints - experimental chemists use this as a measure of initial rates pretty frequently) into the loss function for kinetic models. I've found this somewhat useful for making sure that the solutions converge rapidly towards a minimum in which not only the general outline of the trace is reproduced, but also the slopes of the traces are well defined. In principle it should all be the same minimum, but I think the loss landscape becomes a bit more convex and I think doing this might eliminate some local minima that I've gotten stuck in sometimes.

@sebapersson
Copy link
Owner

Thanks! That makes sense - hadn't quite realized how the variable transformation was working in the rest of the code. This fixes the problems in the minimal problem, there's still something else wrong in my actual problem. Let me see if I can figure out what that is and I'll get back to you on that.

Happy this solved the problem, hope you manage to fix the big problem (otherwise feel free to reach out!)

In terms of loss functions: I've previously added time derivatives or similar terms (such as the slope of a line fit to the first n datapoints - experimental chemists use this as a measure of initial rates pretty frequently) into the loss function for kinetic models. I've found this somewhat useful for making sure that the solutions converge rapidly towards a minimum in which not only the general outline of the trace is reproduced, but also the slopes of the traces are well defined. In principle it should all be the same minimum, but I think the loss landscape becomes a bit more convex and I think doing this might eliminate some local minima that I've gotten stuck in sometimes.

You cannot add things like this directly, but there are workarounds. For example, lets say you want to compare the time-derivative of the ODE-solution against a linear slope, you could split your cost function into two parts. In the first you compute the cost with PEtab.jl (standard cost), and in the second part you compute your custom cost (compare against slope) with help of the utility functions. As an example:

petab_problem = PEtabModel(...)
function cost_p1(x)
    return petab_problem.compute_cost(x)
end 
function cost_p2(x)
    # id is the simulation condition 
    odesol = get_odesol(x, petab_problem; condition_id = id)
    # do things to compute custom slope
end
cost_total(x) = cost_p1(x) + cost_p2(x)

Similarly, for cost_p2 you can build a gradient function with using for example ForwardDiff and combine with the PEtab.jl gradient function (gradients are linear operators so adding a new function is not a problem). There are some pitfalls when adding custom factors like these (e.g. easy to mess up gradient computations), but feel free to reach out if you want to test something like this later and have any problems (and hopefully when done updating PEtab.jl I can figure out some easier way to add custom things in the objective :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants