-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Remove AIC, BIC, and DIC #9
Comments
Thank you Carlos. I tend to agree with you that today WAIC, LOO-IC (and WBIC, I don't know this one) are the information criteria to use. I'm including Chis on this response because he specifically asked to include these. As an interim solutions we can also include a clear warning, similar to your above input or Richard McElreath's positioning of these older ic's. From Chris' thumb up I assume he also agrees with you message I will take a look at WBIC although right now I have a bit too much on my plate. EDIT: Given that this package is GPL3 licensed it might still be better to develop a new, MIT licensed package directly from the descriptions of the different ic's. From the StatisticalRethinking introductory/learning point of view MIT licensing is not very important, but for general use it is. Rob |
Rob, I agree also. I was in the process of reimplementing PSIS-LOO but ran into some difficulties. Do you know anyone who is familiar with the algorithm? |
What about Aki Vehtari? Are you using "Vehtari, A., A. Gelman, and J. Gabry. 2015. Efficient implementation of leave-one-out cross-validation and WAIC for evaluating fitted Bayesian models"? |
Yeah. I am using that article. My sticking point is the procedure for fitting the inverse Pareto. |
Funny coincidence -- I'm actually reimplementing it as well, since the R LOO package is looking at using Julia in its backend to speed up its calculations. I've got the code for fitting the inverse Pareto written up already, and I added some more comments that should make it easier to understand. |
Note that the generalized Pareto distribution is like a slightly wonky monomial, and you can easily fit a monomial between any pair of points. The GPD fitting procedure takes advantage of this in a way that works a bit like the Theil-Sen estimator. You start by taking the first quartile as point A, and then for each individual point B in your sample, you fit a GPD going through A and B. This gives you a set of pointwise estimates for the shape parameter k. When this is done, you take the average of all of these pointwise estimators, weighted by the likelihood that they would produce this data set. At the end, an (optional) adjustment to k-hat is used, where you average k-hat together with a weakly informative prior; this reduces variance at the cost of a little bit of bias.
(Note that I haven't tested this code yet.) |
@ParadaCarleton, thank you for sharing your code. Given that you know more about PSIS-LOO and related metrics than I do, it might make sense to collaborate. Our main goal was avoid the GNU license and use the MIT license instead. I know that the community has expressed interest in having these metrics, but no one until now has had the expertise to put it together. |
I'd love to help with this -- I'm pretty sure I can get PSIS-LOO working, along with some related measures like WAIC/WSIC. I do have a bit less experience with specific implementations of MCMC samplers (especially Stan -- most of my work has been in PyMC3+Turing) and Julia itself. If I could get some kind of function that would take objects produced by Stan, Turing, or Soss and convert them into standard objects I could work with, I'd be happy to build the code that takes these and actually calculates the information criteria for them. |
@ParadaCarleton, that sounds great. I will provide a simple example that can be adapted to Stan, Turing and Soss. I can post something here within the next 24 - 36 hours. Our original plan was to develop a generic method that would operate on arrays and then add conversion methods that would interface with MCMCChains and any other data type. For example: A generic method:
A method to interface with MCMCChains
With this approach, the model comparison metrics will work with any data format. We can define the most common data formats, such as MCMCChains. Others can develop custom functions as needed. Does this still sound like a good approach to you? |
I think this sounds good, although I would make the generic method use the AbstractMCMC API rather than just an array, and convert other objects to AbstractMCMC objects. The problem with a basic array is that it would get very confusing to keep track of all the different things. |
I'm trying to wrap my mind about your 2 (very interesting) questions:
On the first item, in Stan you can:
where m1_1s is the sampled Stan model. Typically m1_1 is the Stan language model. This would provide a I'm not sure if e.g. stepper functionality makes sense although maybe it is possible to come up with an Your second question, on the MCMCChains.jl repo, is also very interesting. My main objection to MCMCChains.jl has always been the AxisArray base. But with your suggested approach we could, once convinced the chains are proper, use Makie Layers. Maybe this combination could result in 2 new packages BayesMakiePlots.jl and a new, MIT licensed, version of StatsModelsComparisons.jl (or a better name for the latter repo)? |
Regarding the first question, I actually made a mistake -- since I wasn't familiar with the AbstractMCMC or MCMCChains APIs, I didn't realize they don't really do what I need. I'm still trying to figure out how you get the log-pdf from Stan, Turing, and Soss models after leaving one observation out, which is the main thing I need to build PSIS-LOO. As for the second thing, that's a longer-term interest that could be discussed in another thread -- for now, I'm just trying to get PSIS-LOO working. |
Would it help if I augment the arsenic example in StatsModelComparisons.jl (using Stan) to generate an MCMCChain object with the log-pdfs and maybe Chris can do the same with Turing (using VarInfo?). I can take a look at Soss as well or maybe ask Chad? i would store the log-pdfs in a separate section of the MCMCChain. I fully agree with the second part of your reply. Although I find the pk-plots useful. |
I think Soss support would be great eventually, but Soss is definitely the least popular of the PPLs -- we can focus on getting Turing and Stan working first, then find a way to include Soss. It looks like the relevant objects are the MCMCChain object, which holds a sample from the posterior distribution, and the As for turning this into a full package, the inference algorithm itself is pretty simple, but looking through the LOO package in R, things like "Check to make sure the user didn't make a mistake when they input their data" or "Clean the data so it's usable" seem to make up most of the actual code and could be pretty time-consuming to write. |
I'm currently implementing PSIS-LOO for matrices, following the API for the loo package in R as closely as possible. Functions converting from ArviZ.jl and Stanfit objects will probably end up being useful, although |
That looks pretty easy to do for Stan.jl (I don't know the details of ArviZ.jl). |
Hi Carlos, not sure if you follow TuringLang/MCMCChains.jl#270 (comment) . Rob |
Hi Rob, I hadn't followed it, but I've made some comments now. |
Thanks @ParadaCarleton. I will have some time later this week. Would you consider the test file to be a good example of how to use the package? |
Hi Carlos, thanks for this work! I'm trying to temporarily incorporate your work into StatsModelComparisons.jl by adding ParetoSmooth.jl and RData.jl as dependencies:
when I run a slightly modified version of your runtests.jl:
I get:
If I understand the setup right, I get a similar error if I try this with a reshaped log_lik matrix generated in Stan in the cars_waic example. I think I need to reshape that in an Array with [1000, 50, 4]. log_lik contains an appended-chains version. |
In order to be able to run the cars-waic example in SMC.jl I also added temporarily StanSample.jl, but that is not affecting above initial test script. |
Aah:
it looks like my reshaping of log_lik is wrong and I need to permute the dims. |
Ok, the problem is the shape of If I use the original format of log_ratios (
But with the upper bound at 0.001 it will succeed. Not sure it that upper bound is too liberal. |
Sorry about that -- I accidentally messed up my Github version control and used one that had a bug in it. Version 0.1.1 should be working. RData shouldn't be necessary, it's just a test requirement. |
No problem, needed to look at it more closely anyway. It seems to be working now with the log_lik array generated in the car_waic example although the results appear rather different. I'll also do some more checks tomorrow. Edit: And the test is now indeed within 10%. |
Hi Rob, Can you seed the RNG so that both receive the same input? That might be a more direct comparison. |
In a sense both do get the exact same log_lik matrix as input (4000 x 50). Waics are of course identical. I'm trying to figure out how to compute this from the Psis structure fields:
Adding |
Sorry for noise. I thought that you were passing two different sets of chains, which would provide different results. I don't know much about the diagnostics. Do you have reason to expect the diagnostic plots to differ when given the same input? |
I was trying to compute the waic based on the weights computed by
which results in:
which is close. But is that close enough? The R results were near 421, but I'll check that. To answer your question, no, I don't think so. The plots only change when a new set of draws is generated. |
R results:
But not using the quadratic approximation but Stan proper and taking 4000 samples:
|
A difference of about 4 seems quite large if the the R and Julia code are using the same algorithm and settings with the same inputs. It is definitely more than rounding error. Hmmm. |
I'll try to use the loglik matrix from R to check both psi's and psisloo. |
Using the R generated matrix:
|
Those values look much closer. |
We'll, with the existing psisloo(). But still somewhat off with the new psis(). |
Sorry, are these comparisons of the WAIC with PSIS-LOO, or of the R and Julia versions, or of the new and old Julia versions? And are they being calculated with the newest version on Github (The |
These are comparisons between psis() in ParetoSmooth.jl and psisloo() in StatsModelComparisons.jl. Initially I compared them using Julia & StanJulia but the last results above were using a loglik matrix generated by R. The results of the today's released version (v0.1.0) of ParetoSmooth.jl are identical to above results. |
Can you try again with v0.1.1? |
Hmm, this is the result:
from running this:
with installed packages:
|
If you have
to generate the logprob matrix. Modified to use ulam() and not quap(). |
That's correct -- you might want to double-check that's not the problem. If the answers are off by a constant offset, this shouldn't matter. In addition, are the If neither of these work, try changing these two lines:
|
Hi Carlos, Have tried all three, no change in the loo value in either step. The update to GDP.jl needed to be:
Not quite sure why I needed to add the broadcast '.' before the | In addition, are the rel_eff arguments specified and the same between Julia and R? I believe the settings in R and psisloo() in StatsModelsComparisons.jl are identical, at least the values were a good match in about 8 or so example cases I used for testing before releasing StatsModelsComparisons.jl. At the same time I have a hard time comparing the handling of rel_eff between the 2 Julia versions. psisloo() certainly does not use MCMCChains' ESS method. Doing the Turing test with the car data might shed more light on this. |
So, are the comparisons between (As for the broadcast, my original had a typo -- |
The results of the tests are that psisloo() and R’s psis remain identical (I just reran that after upgrading all R packages). The new psis() (yours) gives a slightly different value. I suggest we wait until you are done with the loo extensions because it is very possible my test code is at fault. If that’s not the case we should look at the ess values. In the
If you have Given that you mentioned the possibility of more packages for model selection purposes, maybe it makes sense to create a ModelSelection Github organization to have an anker point? Just a thought. |
Ahh, I just plan to build one package for model selection purposes, separately from the implementation of PSIS. This way, people can use PSIS for things like VI+Importance Sampling without having to install a bunch of model comparison tools. Note that when The model comparison package has been put here, and now implements a LOO method. The answers looked sensible but I'm not 100% sure they're good either. |
Thank you. I modified the function definition a bit:
because The result I'm getting for the cars example:
|
Thanks. I have no idea how my testing didn't catch that. The function shouldn't have run at all when I tried that.
So it looks like the problem isn't in the addition code. I've tried updating the code in ParetoSmooth; can you see if the new version works? It seems to give identical weights (up to floating point error) to R's Loo package. ( |
Makes little difference. I think, given above your check on the weights, your code at this point is more trustworthy than the old code. I now get:
For completeness, the pareto_k plots. First plot is basically |
Carlos, can you describe how you compare the weights between Julia and R. Just for my education. |
Just using
The weights and |
Noticed you just did a major addition. Thanks for above suggestions, this gives me a better target to look for if the new version doesn't solve it! Thanks for this work, in my opinion a major step forwards. |
Yep, I found the mistake and fixed it. Because the LOO code ended up being smaller than I expected, I've decided to incorporate it into ParetoSmooth.jl instead of spinning it off. The code can be found here. However, I think it doesn't run at the moment -- I'm trying to sort out a bug that keeps me from running tests on my own computer... |
AIC, BIC, and DIC have all been fully supplanted by WAIC, WBIC, and LOO-IC, and there's no longer any real reason to use them. I would suggest removing them from the package to avoid confusing beginners. Someone seeing them used in the package may be led to believe that using them is a good idea, that they provide information different from that provided by the newer information criteria, or that there must be some reason they're in the package. BIC is especially problematic: The name simultaneously gives new users the false impression that the BIC is actually Bayesian and that WAIC/LOO-IC are not Bayesian.
It might be a good idea to implement WBIC to have a replacement for BIC, since BIC has a different use case from the other information criteria in this package (asymptotically maximizing the probability of selecting the correct model, rather than maximizing the expected predictive accuracy of the model). I would also suggest referring to it as the Watanabe-Schwarz Information Criterion to avoid giving the impression that WBIC is "Especially" Bayesian.
The text was updated successfully, but these errors were encountered: