-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model hashing #1762
Comments
The measurement definition is what uniquely identifies that model fyi -- a workspace is unique (and has a hash) via Specifically,
should always give you the corresponding hash for that workspace that's uniquely identifiable. The problem is that two workspaces could be practically identical, but different due to things like floating-point differences... (I think sorting is done by default) |
The The reason for only hashing the model is that in this use case of yield uncertainty calculation the measurement does not matter, two models are "equal" if they provide the same edit: Yes, looks like |
this isn't true because a measurement can contain overrides, different values for lumi, different bounds, etc.. Although arguably, I think only lumi is the only special one that is set through a measurement object which can change expected data, but the rest is pretty fixed.
yeah, this was written to be somewhat generic enough, and yes, the interpcodes are left out. We need to incorporate the part of HiFa specification that has interpcodes, so that they're treated as first-class citizens rather than necessarily an implementation configuration (which is how they're treated right now). |
I gave this a try and did not manage to create a setup where the I think now that this makes sense, when not scaling samples to luminosity and setting the auxdata to luminosity instead, the best-fit result for the Is there any other way in which the |
When comparing models and checking for equality via model spec + interpcodes, one thing that will of course be missed is different measurement config related things like I think there's some conceptual distinction between aspects of the model itself (pdf / predicted distributions) and information regarding the use of the model (measurement). They all live in |
Summary
It could be convenient to be able to hash a
pyhf.Model
to allow comparing models to each other. For an example, see @lhenkelm 's comment scikit-hep/cabinetry#315 (comment) and scikit-hep/cabinetry#322, where this is used in a model-specific cache.Additional Information
As far as I am aware, model specification + interpcodes is all that is required to uniquely identify a model. If there is additional information, that would be great to know!
Code of Conduct
The text was updated successfully, but these errors were encountered: