Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple membership models #797

Open
jfb-h opened this issue Jan 9, 2025 · 2 comments
Open

Multiple membership models #797

jfb-h opened this issue Jan 9, 2025 · 2 comments

Comments

@jfb-h
Copy link

jfb-h commented Jan 9, 2025

First of all, thank you for the amazing package. It feels like dark magic to fit models to millions of observations in a second or so.

I sometimes encounter problems where observations can belong to more than one group. In the past, I have used Bayesian inference to fit these multiple-membership mixed models, which however doesn't scale as well to large numbers of observations. Is it possible to use the machinery of the package to fit these? If yes, would you have any guidance on how to do so?

@palday
Copy link
Member

palday commented Jan 21, 2025

Although lme4 and MixedModels.jl share a common intellectual heritage, they differ in a key way (beyond the choice of implementation language 😉) that makes a MixedModels.jl analogue of lmerMultiMember a bit more of an investment than it was for lme4. Let me try to explain in a way that is both brief and detailed enough to give a feel for the effort involved.

The major innovation of lme4 compared to its predecessor nlme is that the underlying model fitting problem was formulated as a penalized least squares problem instead of a generalized least squares problem. PLS has a few advantages compared to GLS:

  • the formulation makes it trivial to support multiple nested, partially and fully crossed random effects
  • the formulation yields a representation as sparse matrices, thus allowing for very large models to be both represented and fitted efficiently (i.e. smaller memory footprint and faster compute time)
  • (somewhat of a corollary of the other two) there is no need to special on particular covariance structures for computational efficiency -- you can fit an unconstrained model without issue.

Given that a lot of historical work had gone into all sorts of wonky special cases of nesting and constrained covariance structures as a way of making it computationally tractable to fit models, these are big advances! In other words, the PLS approach is a very efficient approach for the general case. However, it turns out that it can be hard to express certain constraints in the PLS framework -- things like constrained covariance matrices (if you have a domain specific reason to require them) or correlated residuals are not trivial to express in the PLS framework.

lme4 took this novel formulation and used generic sparse matrix methods to solve them. However, it turns out that the PLS formulation yields not just sparse matrices, but rather matrices with a very particular sparsity pattern (blocked sparse, where the blocks correspond to levels of the grouping factor) in the single membership case. In MixedModels.jl, we take advantage of Julia's type system and multiple dispatch to define specialized sparse matrix types that allow us to both store and perform computations even more efficiently than with generic sparse matrices.

For lmerMultiMember, the implementation was straightforward: we just had to create a way to specify a multi-membership model, then construct the matrices, and then lme4 would handle the rest without difficulty. (IIRC we also had to do some tinkering with various display methods, but nothing too onerous.) About a year before lmerMultiMember (and funnily enough, while I was visiting my co-author JvP, but before he had reason to work with multi-membership models), I tried my hand at doing the same thing for MixedModels.jl. I manually constructed the relevant model matrices and discovered just how much specialized machinery we have in MixedModels.jl that doesn't expect generic sparse matrices. There's no reason why that generic machinery couldn't be added -- it's part of my very long list of cool improvements that I would like to do if I ever simultaneously had time and creative energy! If you want to take stab at implementing it, then this vignette is what lmerMultiMember started from. Unfortunately, you'll relatively quickly run into MethodErrors in Julia. I'm happy to provide review feedback and answer direct questions, but you'll probably need a fair amount of time to implement everything (unless you're already fairly familiar with MixedModels.jl internals....).

@jfb-h
Copy link
Author

jfb-h commented Jan 30, 2025

Thanks a lot for the thorough reply, that was really helpful! MixedModels.jl is an awesome tool and it is cool to learn about its history. I would be very much out of my depth diving into its internals, though, so I probably wont be able to contribute to this endeavour (though I surely would be a happy user if you ever get to it!). I tried lmerMultiMember on my problem and although you can definitely feel the performance difference compared to MixedModels.jl, it does get the job done so far, so I will probably stick with that for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants