Skip to content

Commit

Permalink
skip duplicate calculation when off-diagonal terms are present, add c…
Browse files Browse the repository at this point in the history
…omments
  • Loading branch information
alexander-held committed Feb 9, 2022
1 parent 5fc7413 commit 79a39ac
Showing 1 changed file with 17 additions and 16 deletions.
33 changes: 17 additions & 16 deletions src/cabinetry/model_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,31 +274,32 @@ def yield_stdev(
up_variations_ak = ak.from_iter(up_variations)
down_variations_ak = ak.from_iter(down_variations)

# total variance, indices are: channel, bin
n_channels = len(model.config.channels)
total_variance_list = [
np.zeros(model.config.channel_nbins[ch]) for ch in model.config.channels
] # list of arrays, each array has as many entries as there are bins
# append placeholders for total yield uncertainty per channel
total_variance_list += [np.asarray([0]) for _ in range(n_channels)]
total_variance = ak.from_iter(total_variance_list)

# loop over parameters to sum up total variance
# first do the diagonal of the correlation matrix
# calculate symmetric uncertainties for all components
sym_uncs = (up_variations_ak - down_variations_ak) / 2
total_variance = np.sum(np.power(sym_uncs, 2), axis=0)

# continue with off-diagonal contributions if there are any
if np.count_nonzero(corr_mat - np.diagflat(np.ones_like(parameters))) > 0:
# possible optimizations missing here now:
# - skipping staterror-staterror combinations (orthogonal)
# calculate total variance, indexed by channel and bin (per-channel numbers act like
# additional channels with one bin each)
if np.count_nonzero(corr_mat - np.diagflat(np.ones_like(parameters))) == 0:
# no off-diagonal contributions from correlation matrix (e.g. pre-fit)
total_variance = np.sum(np.power(sym_uncs, 2), axis=0)
else:
# full calculation including off-diagonal contributions
# with v as vector of variations (each element contains yields under variation)
# and M as correlation matrix, obtain variance via matrix multiplication:
# variance = v^T @ M @ v

This comment has been minimized.

Copy link
@lhenkelm

lhenkelm Feb 9, 2022

Contributor

Minor nitpick: the comment is a little misleading (but I think I made the same mistake in one of my gists in #315, so apologies for the confusion):
the @ expresses matrix multiplication, which reduces the arrays involved (inner product).
So the result of the expression in the comment would be scalar:

>>> import numpy as np
>>> v = np.arange(4)
>>> m = np.arange(16).reshape((4,4))
>>> v.T @ m @ v
420

Really what we want (and what the code is doing) is per colum/row element-wise multiplication.

PS: sorry, this got misplaced somehow. The line I am taling about is this one: https://github.com/scikit-hep/cabinetry/pull/316/files#diff-c3e311ea31becb8df4353a00e34dbd57b456ffcc87660d87f982cbfad1e7606bR289

This comment has been minimized.

Copy link
@alexander-held

alexander-held Feb 9, 2022

Author Member

I think you're correct. My view was that v is an array of objects (which happen to be vectors), and the "matrix multiplication" is the sum_{i,j} v[i]*v[j]*M[i][j] where the product of the objects v[i]*v[j] is element-wise within that object and not an inner product. I'll remove this comment to prevent additional confusion. The calculation can be written in this way with numpy, but then it seems like the diagonal needs to be extracted in the end. I'll post a comment with an example (also to remind my future self when looking at this again).

# variance shape is same as element of v (yield uncertainties per bin & channel)

# possible optimizations that could be considered here:
# - skipping staterror-staterror terms for per-bin calculation (orthogonal)
# - taking advantage of correlation matrix symmetry
# - (optional) skipping combinations with correlations below threshold

R = corr_mat[..., np.newaxis, np.newaxis] * sym_uncs[np.newaxis, ...]
L = sym_uncs[:, np.newaxis, ...] * R
total_variance = np.sum(ak.flatten(L, axis=1), axis=0)

# convert to standard deviations per bin and per channel
n_channels = len(model.config.channels)
total_stdev_per_bin = np.sqrt(total_variance[:n_channels])
total_stdev_per_channel = ak.flatten(np.sqrt(total_variance[n_channels:]))
log.debug(f"total stdev is {total_stdev_per_bin}")
Expand Down

0 comments on commit 79a39ac

Please sign in to comment.