-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kronecker Products #1454
Comments
The FFT functions we use from Eigen are also in their unsupported module, so that seems like an okay place to start |
To integrate the Kronecker Product from Eigen, should I start by emulating the changes to |
We probably don't want to explicitly construct a Kronecker product. Eigen uses the obvious expression template implementation in C++ where you just hold a reference to the argument matrices and then lazily evaluate entries as needed. For larger scale problems, we want to be able to do things like Cholesky factorization efficiently on Kronecker products. It's a rabbit hole, for sure. @WardBrian and @SteveBronder are probably the best people to get involved. |
@bob-carpenter can you clarify: @WardBrian suggested using Eigen's |
Sure. The key idea is that Eigen's Kronecker product implementation is an expression template. See: https://en.wikipedia.org/wiki/Expression_templates All it does if you call
The modular arithmetic to go the other way is on the top of the Wikipedia page. https://en.wikipedia.org/wiki/Kronecker_product Ideally, we will never explicitly construct The real trick is to make sure we can do things like Cholesky factorization efficiently, because the Cholesky factor of a Kronecker product is the Kronecker product of the Cholesky factors. |
Assigning to a variable in a Stan program will always evaluate the expression, so using these will be very difficult to do efficiently if just implemented as a normal function. A working but difficult to use implementation might be a good first step in thinking about a better one, though |
Good point. We can have function returns that don't collapse template expressions, but if you assign the Kronecker product to a Stan variable, it'll be a problem in that it'll do the allocation and copy. I think that's true even if the variable being assigned to is a function argument. Everything's going to have to remain unevaluated (in the no calls to |
This is a bit more complicated. Built in functions can accept unevaluated templates and keep them lazy. User defined functions are code generated such that they can accept these, but we immediately call to_ref on them which will evaluate them |
Thanks. I should have clarified that I meant user-defined functions. I hadn't realized we'd generalized the argument templates---but I guess that doesn't matter if we immediately evaluate! |
Intro
I've been playing around with different approaches to efficiently computing Kronecker products, with particular interest in the cases where both input matrices are either covariance matrices or correlation matrices (common in multi-output Gaussian Processes). These special cases have symmetries that might be leveraged to achieve more efficient compute.
Implementations
Below are "user defined function" implementations of
kprod()
(arbitrary inputs),kprod_cov
(covariance inputs), andkprod_corr
(correlation inputs). After looking at the kronecker product functions currently in Eigen (in an unsupported section here), I also wrote*_blocked
versions of each, where I gather the idea is that blocked computation achieves better memory locality.Here's the collection of functions:
Benchmarks
I've attempted to benchmark these implementations across a variety of input sizes and with both default model compilation options:
as well as "fast" model compilation options:
Using the unblocked
kprod()
performance as baseline, here's some relative-performance data (note: I'm still computing more values and more samples for the larger sizes and will update this plot as things finish; at present there are 20 samples for each point in the 2-65 range as well as 127-129):Benchmark take-aways
kprod_cov
performs better thankprod
at the smallest input sizes, but eventually seems to fall to equal performance with larger sample sizes.kprod_cov_blocked
performs worse than the non-blockedkprod
at the smallest input sizes, but achieves equal performance by moderate input sizes and increases monotonically (if asymptotically) thereafter for large performance benefits at large input sizes.kprod_blocked
has a similar relative performance trajectory askprod_cov_blocked
, but there still seems to be benefit of the latter, especially when using default model compilation options.kprod
) in many of the kprod-alternatives at the N=64 & N=128 values (note: values of N of 63,65,127 & 129 are also present in the graph, showing that the spikes occur at 64 & 128 specifically). I'm not really sure what to make of those nor why the blocked functions spike to higher relative performance at those values while the non-blocked functions spike to lower relative performance at those values.Questions
I welcome any thoughts on this. Are any of these benchmarks even pertinent, or would implementation in Stan directly have performance implications that make this benchmarking of user-defined functions irrelevant? Also, given the minor performance bump seen here between
kprod_blocked
andkprod_cov_blocked
, maybe it makes sense to start with simply using the existing-but-unsupported Eigen implementation that is akin tokprod_blocked
?The text was updated successfully, but these errors were encountered: