Problems encountered in trajectory matching #171
Replies: 13 comments
-
@stevenhuyi: thanks for the detailed report. The error message arises because the Looking at your skeleton C snippet, I notice that it is probably not going to behave the way you intend. You seem to want to model it as a map, but many of the equations read like the specification of a vectorfield. In particular, in the following lines it does seem like the RHS should be proportional to the timestep.
Perhaps you can explain your rationale and I can offer advice? |
Beta Was this translation helpful? Give feedback.
-
Thanks for your instant reply, @kingaa. I think I should reply to your second question first. As my equations showed, I should specify a vectorfield in skeletion part. Actually, I didn't know exactly when to specify a vectorfield and when to a map at the beginning although I was sure that my equation is a differential one. Now, I think it should be specified as a vectorfield. Would you please also explained a bit more when to use a map although the documentation showed that it is used in a discrete process. If the vectorfield is chose, how to define |
Beta Was this translation helpful? Give feedback.
-
This is one thing that is not complicated! The generator of a continuous-time dynamical system is a vectorfield; a map generates a discrete-time dynamical system. When you specify the skeleton as a vectorfield and ask for a trajectory, pomp calls an ODE integrator from the deSolve package. This routine decides internally how to translate your specification of the rate of various events into the trajectory they imply. I suggest you read up on the documentation of the deSolve package.
You misunderstood me. I did not say that In an rprocess snippet, |
Beta Was this translation helpful? Give feedback.
-
I am wondering where
if so, what's the point of specifying Finally, I changed my codes as follows to estimate the parameters, however, it turned out some problems. I thought if it is the subjective starting value for parameters causing the problem. Would you help me out. Thanks very much.
The minimized value of the function is NA. |
Beta Was this translation helpful? Give feedback.
-
Yes, the step size is chosen by the pomp function that is evaluating your rprocess snippet.
As the documentation states, |
Beta Was this translation helpful? Give feedback.
-
Thanks for your explainations, @kingaa, and they are very helpful. However, the results of my trajectory fitting are as follows which are very strange.
I did not estimate |
Beta Was this translation helpful? Give feedback.
-
Did you notice any warning messages? I did when I ran your code. You should be aware that, in some parameter regimes, your model may be stiff, which makes integration hard. I don't have the time to teach you about all of these things. You can read about numerical solution of ODE (the deSolve manual may be a good place to start). Also, you should be aware that nonlinear optimization is a hard problem, suffering from pathologies such as multiple local minima. It is typically necessary to put more effort into the search for the MLE than simply doing one optimization run. Finally, it's a good idea to examine your model at the parameters that are causing problems and figure out why there are problems. For example, plug those parameters into |
Beta Was this translation helpful? Give feedback.
-
Hi, @kingaa . Thanks very much for you advice on estimating parameters in the non-linear dynamic system. I've read a lot of literature on this topic and pomp. Now, I‘m still working on my data. This time I use the
Although the overall fitting is relatively well the diagnostic test indicated that the 5% quantile, acf[1] and acf[3] of original data were far away from those of simulations (as follows)?I'm wondering if the fittting can be further improved? Additionally, I still have two further concerns: First, it is always better to get proper guesses on models' parameters (by doing simulations) to get a relative well fitting before conducting estimation to get relative . I found it difficult to get proper ones especially there are large number of parameters to estimate. For some, e.g., alpha and rho in my example, we have some priors, but others, like a, b, c, d, and r, we absolutely have no clues (note that the value of r is much larger than others in my example). Is there any experience to deal with this? Second, I'm wondering how to calculate the interval confidence of the parameters. I have looked through all the tutorials, but haven't found any clue. |
Beta Was this translation helpful? Give feedback.
-
@stevenhuyi: you ask a number of interesting questions.
From the most general perspective, one can always seek to improve the match between model and data. Whether it is worthwhile to do so depends on your purposes, which we have not discussed. From a narrower perspective, what does the discrepancy between model prediction and data, as revealed in these three summary statistics, tell us? The low 5th-percentile statistic says that the data go to levels considerably lower than the model predicts. Why does the model not reach such low levels? The discrepancies in the ACFs suggest different frequencies of fluctuations in the model and the data. By thinking about the relation between these aspects and the model, you might come to an understanding of how constraints on the model are preventing it from doing a better job explaining the data. Whether it is worthwhile relaxing these constraints is another question: again, the answer depends on your purposes.
Is your first sentence correct as written? I don't understand it. Is it meant to be a question? It's important to note that it's rarely the case that you have absolutely no clue about a parameter's value. What are the relevant scales in the problem? Are there edge cases? Since you've included the parameter, it must have a meaning to you. Given that meaning, what is a plausible range? More generally, you put your finger on the fundamental problem of parameter estimation: parameter spaces in interesting models are essentially always of inconveniently high dimension. There is nothing one can do about this but search carefully and document your search. Two reflections on this may be useful to you.
For computing confidence intervals, we typically recommend profile likelihood. For a discussion of this and examples, see the SBIED short-course materials. E.g., here and here. |
Beta Was this translation helpful? Give feedback.
-
@kingaa thanks for your patience and explanations.
The primary goal of my experiment is investigate whether the intensity of the hand foot and mouth disease (HFMD) incidence (shown in
the probe tests for the two infection rate are as follows: But some diagnostic statistics are still not good. Honestly, I am not quite sure about "the constraints that prevent my model from explaining the data well". Would you please give me some clues? Another concern is that it is really time-consuming to explore the global search for the optimal parameter set (i.e., almost 12 hours in my coding) for one city. In order to investigate my assumption of climatic-factor-driven incidence of HFMD, I have to conduct the test in other cities. I have HFMD incidence data for 92 cities located in East China. Is there any way to alleviate such huge calculation? Thanks very much again. |
Beta Was this translation helpful? Give feedback.
-
@stevenhuyi: Unfortunately, I have limited time available to help you reason about your model and your data. You are much more intimately familiar with them than I am. My advice to you is twofold: First, keep in mind that no model will ever be perfect. If you set yourself the goal of finding a model for which diagnostic statistics Second, your goal in interpreting a diagnostic statistic—which quantifies one feature of the data—is to understand what aspects of the model structure are preventing the model from agreeing with that feature of the data. This may or may not be obvious to you: if it is not, it will drive you to understand your model better. Once you do understand this, you have a choice as to whether it is necessary or desirable to modify the model to alleviate this constraint. Here again, keep in mind that no model will ever be perfect, nor is perfection necessary for scientific progress. How do you go about learning which features are within the model's repertoire, which are not, and what the trade-offs between features? The theories of nonlinear dynamical systems and of stochastic processes have much to say in this regard, and it is worthwhile to study them. Too, simulation is an indispensable tool. You can simulate to explore empirically what changes of parameters lead to changes in one probe, and how these changes affect other probes and the likelihood itself. You can plot trajectories of state variables to understand the story the model is telling about how the data arose. You can also plot individual terms of your model equations to interrogate the model's story more closely. Finally, you mention your concerns about computational expense. Simulation-based methods are powerful, but they are not cheap. If it makes you feel any better, we do not consider it unusual to spend several days fitting a complex model to data, using hundreds of processors on a high-performance computing cluster. As you think about scaling your computations up to the 92 cities, you will certainly want to parallelize your calculations. See the SBIED course materials (starting with Lesson 3) for many examples. Depending on your questions, you may want to think about using panelPomp or spatPomp. |
Beta Was this translation helpful? Give feedback.
-
@kingaa Thanks for your instant and detailed reply every time.
Got it. I used to focus too much on those probe statistics and always hope they are all good. This time, I've comapre the model simulations with data as follows:
Thanks, @kingaa. I will surely have a try. |
Beta Was this translation helpful? Give feedback.
-
Very good observation. As you say, perhaps your seasonality predictors are not as good as you had hoped. One suggestion: supplement the envelope plot you made above with a plot showing the data and 10 or so random simulations. This can reveal things missed by the envelope plot. |
Beta Was this translation helpful? Give feedback.
-
I am trying to use an SEIR model in pomp to fit my observed seiries of HFMD data. I encounter an error when constructing the skeleton argument and my codes are as follows:
here comes the error:
As the births in the covariate dataframe is the annual accumulated number of the born, births * dt should be the number of individuals entering the susceptible at each dt step. What type is the "dt" variable? it seems that dt can not be used in binary calcualtion. I also changed it to a stochastic process using following code, but it works. Would you please tell me why? Thanks in advance.
ind_case.csv
ind_covar.csv
Beta Was this translation helpful? Give feedback.
All reactions