Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unserializable histogram generated if computed without fill call #153

Closed
yimuchen opened this issue Oct 7, 2024 · 5 comments · Fixed by #154
Closed

Unserializable histogram generated if computed without fill call #153

yimuchen opened this issue Oct 7, 2024 · 5 comments · Fixed by #154

Comments

@yimuchen
Copy link

yimuchen commented Oct 7, 2024

Originally posted here: scikit-hep/hist#586
And I found that this issue can be replicated entirely upstream using just dask-histogram

import pickle

import boost_histogram
import dask
import dask_histogram.boost
import numpy

h = dask_histogram.boost.Histogram(boost_histogram.axis.Regular(10, 0, 1))
# h.fill( # Toggle to include a fill call
#     dask.array.from_array(numpy.zeros(shape=(10))),
#     weight=dask.array.from_array(numpy.zeros(shape=(10))),
# )

o = dask.compute(h)
pickle.dump(o, open("hist_dask_test.pkl", "wb"))

Without a fill call, the resulting histogram will have dangling a self._dask field that breaks serialization.

@martindurant
Copy link
Collaborator

Hm, the computed object shouldn't have any ties to dask at all.

Note that cloudpickle (and dask's serialize) is able to serialise this object.

@martindurant
Copy link
Collaborator

The following fixes it, but I would like to figure out why the attribute is there in the first place

--- a/src/dask_histogram/boost.py
+++ b/src/dask_histogram/boost.py
@@ -137,7 +137,12 @@ class Histogram(bh.Histogram, DaskMethodsMixin, family=dask_histogram):
         return self.dask_name

     def __dask_postcompute__(self) -> Any:
-        return lambda x: self._in_memory_type(first(x)), ()
+        def f(x):
+            out = self._in_memory_type(first(x))
+            if hasattr(out, "_dask"):
+                del out._dask
+            return out
+        return f, ()

     def __dask_postpersist__(self) -> Any:

@martindurant
Copy link
Collaborator

I have not yet come up with any better solution, so if you would check what I put above, I would appreciate it.

@yimuchen
Copy link
Author

I have not yet come up with any better solution, so if you would check what I put above, I would appreciate it.

Just confirmed that this indeed allows empty histograms to be pickled/unpickled as usual. Thanks!

@martindurant
Copy link
Collaborator

OK, let's make that PR for now, as certainly it doesn't seem to have a downside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants