Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetadataError from ValueError: Could not convert object to NumPy datetime #201

Closed
TomNicholas opened this issue Jul 25, 2024 · 7 comments · Fixed by #206
Closed

MetadataError from ValueError: Could not convert object to NumPy datetime #201

TomNicholas opened this issue Jul 25, 2024 · 7 comments · Fixed by #206
Labels
bug Something isn't working CF conventions references generation Reading byte ranges from archival files usage example Real world use case examples

Comments

@TomNicholas
Copy link
Member

I'm trying to debug @thodson-usgs's example from cubed-dev/cubed#520 (and originally #197).

He is doing a whole serverless reduction of virtual references to multiple files (!!! - relevant to #123), but there seem to be some more basic errors to be fixed first.

Specifically, if I try to use virtualizarr on just one of his files this happens:

import xarray as xr
from virtualizarr import open_virtual_dataset

vds = open_virtual_dataset(
    's3://wrf-se-ak-ar5/ccsm/rcp85/daily/2060/WRFDS_2060-01-01.nc',
    indexes={},
    loadable_variables=['Time'],
    cftime_variables=['Time'],
)
vds
<xarray.Dataset> Size: 31MB
Dimensions:        (Time: 1, south_north: 250, west_east: 320,
                    interp_levels: 9, soil_layers_stag: 4)
Coordinates:
    interp_levels  (interp_levels) float32 36B ManifestArray<shape=(9,), dtyp...
    Time           (Time) datetime64[ns] 8B 2060-01-01
Dimensions without coordinates: south_north, west_east, soil_layers_stag
Data variables: (12/39)
    SNOWH          (Time, south_north, west_east) float32 320kB ManifestArray...
    ACSNOW         (Time, south_north, west_east) float32 320kB ManifestArray...
    TSK            (Time, south_north, west_east) float32 320kB ManifestArray...
    XLONG          (south_north, west_east) float32 320kB ManifestArray<shape...
    T              (Time, interp_levels, south_north, west_east) float32 3MB ...
    XLAT           (south_north, west_east) float32 320kB ManifestArray<shape...
    ...             ...
    PSFC           (Time, south_north, west_east) float32 320kB ManifestArray...
    ALBEDO         (Time, south_north, west_east) float32 320kB ManifestArray...
    CLDFRA         (Time, interp_levels, south_north, west_east) float32 3MB ...
    SWDNB          (Time, south_north, west_east) float32 320kB ManifestArray...
    PW             (Time, south_north, west_east) float32 320kB ManifestArray...
    SH2O           (Time, soil_layers_stag, south_north, west_east) float32 1MB ManifestArray<shape=(1, 4, 250, 320), dtype=float32, chunks=(1, 4, 250, 32...
Attributes:
    contact:  [email protected]
    data:     Downscaled CCSM4
    date:     Mon Oct 21 11:37:23 AKDT 2019
    format:   version 2
    info:     Alaska CASC
ds = xr.open_dataset('combined.json', engine="kerchunk")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/meta.py:127, in Metadata2.decode_array_metadata(cls, s)
    126 dimension_separator = meta.get("dimension_separator", None)
--> 127 fill_value = cls.decode_fill_value(meta["fill_value"], dtype, object_codec)
    128 meta = dict(
    129     zarr_format=meta["zarr_format"],
    130     shape=tuple(meta["shape"]),
   (...)
    136     filters=meta["filters"],
    137 )

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/meta.py:260, in Metadata2.decode_fill_value(cls, v, dtype, object_codec)
    259 else:
--> 260     return np.array(v, dtype=dtype)[()]

ValueError: Could not convert object to NumPy datetime

The above exception was the direct cause of the following exception:

MetadataError                             Traceback (most recent call last)
Cell In[8], line 1
----> 1 ds = xr.open_dataset('combined.json', engine="kerchunk")

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/backends/api.py:571, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    559 decoders = _resolve_decoders_kwargs(
    560     decode_cf,
    561     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    567     decode_coords=decode_coords,
    568 )
    570 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 571 backend_ds = backend.open_dataset(
    572     filename_or_obj,
    573     drop_variables=drop_variables,
    574     **decoders,
    575     **kwargs,
    576 )
    577 ds = _dataset_from_backend_dataset(
    578     backend_ds,
    579     filename_or_obj,
   (...)
    589     **kwargs,
    590 )
    591 return ds

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/kerchunk/xarray_backend.py:12, in KerchunkBackend.open_dataset(self, filename_or_obj, storage_options, open_dataset_options, **kw)
      8 def open_dataset(
      9     self, filename_or_obj, *, storage_options=None, open_dataset_options=None, **kw
     10 ):
     11     open_dataset_options = (open_dataset_options or {}) | kw
---> 12     ref_ds = open_reference_dataset(
     13         filename_or_obj,
     14         storage_options=storage_options,
     15         open_dataset_options=open_dataset_options,
     16     )
     17     return ref_ds

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/kerchunk/xarray_backend.py:46, in open_reference_dataset(filename_or_obj, storage_options, open_dataset_options)
     42     open_dataset_options = {}
     44 m = fsspec.get_mapper("reference://", fo=filename_or_obj, **storage_options)
---> 46 return xr.open_dataset(m, engine="zarr", consolidated=False, **open_dataset_options)

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/backends/api.py:571, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    559 decoders = _resolve_decoders_kwargs(
    560     decode_cf,
    561     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    567     decode_coords=decode_coords,
    568 )
    570 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 571 backend_ds = backend.open_dataset(
    572     filename_or_obj,
    573     drop_variables=drop_variables,
    574     **decoders,
    575     **kwargs,
    576 )
    577 ds = _dataset_from_backend_dataset(
    578     backend_ds,
    579     filename_or_obj,
   (...)
    589     **kwargs,
    590 )
    591 return ds

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/backends/zarr.py:1182, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version, store, engine)
   1180 store_entrypoint = StoreBackendEntrypoint()
   1181 with close_on_error(store):
-> 1182     ds = store_entrypoint.open_dataset(
   1183         store,
   1184         mask_and_scale=mask_and_scale,
   1185         decode_times=decode_times,
   1186         concat_characters=concat_characters,
   1187         decode_coords=decode_coords,
   1188         drop_variables=drop_variables,
   1189         use_cftime=use_cftime,
   1190         decode_timedelta=decode_timedelta,
   1191     )
   1192 return ds

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/backends/store.py:43, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
     29 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
     30     self,
     31     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
     39     decode_timedelta=None,
     40 ) -> Dataset:
     41     assert isinstance(filename_or_obj, AbstractDataStore)
---> 43     vars, attrs = filename_or_obj.load()
     44     encoding = filename_or_obj.get_encoding()
     46     vars, attrs, coord_names = conventions.decode_cf_variables(
     47         vars,
     48         attrs,
   (...)
     55         decode_timedelta=decode_timedelta,
     56     )

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/backends/common.py:221, in AbstractDataStore.load(self)
    199 def load(self):
    200     """
    201     This loads the variables and attributes simultaneously.
    202     A centralized loading function makes it easier to create
   (...)
    218     are requested, so care should be taken to make sure its fast.
    219     """
    220     variables = FrozenDict(
--> 221         (_decode_variable_name(k), v) for k, v in self.get_variables().items()
    222     )
    223     attributes = FrozenDict(self.get_attrs())
    224     return variables, attributes

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/backends/zarr.py:563, in ZarrStore.get_variables(self)
    562 def get_variables(self):
--> 563     return FrozenDict(
    564         (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
    565     )

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/utils.py:443, in FrozenDict(*args, **kwargs)
    442 def FrozenDict(*args, **kwargs) -> Frozen:
--> 443     return Frozen(dict(*args, **kwargs))

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/backends/zarr.py:563, in <genexpr>(.0)
    562 def get_variables(self):
--> 563     return FrozenDict(
    564         (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
    565     )

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/hierarchy.py:691, in Group._array_iter(self, keys_only, method, recurse)
    689 if contains_array(self._store, path):
    690     _key = key.rstrip("/")
--> 691     yield _key if keys_only else (_key, self[key])
    692 elif recurse and contains_group(self._store, path):
    693     group = self[key]

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/hierarchy.py:467, in Group.__getitem__(self, item)
    465 path = self._item_path(item)
    466 try:
--> 467     return Array(
    468         self._store,
    469         read_only=self._read_only,
    470         path=path,
    471         chunk_store=self._chunk_store,
    472         synchronizer=self._synchronizer,
    473         cache_attrs=self.attrs.cache,
    474         zarr_version=self._version,
    475         meta_array=self._meta_array,
    476     )
    477 except ArrayNotFoundError:
    478     pass

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/core.py:170, in Array.__init__(self, store, path, read_only, chunk_store, synchronizer, cache_metadata, cache_attrs, partial_decompress, write_empty_chunks, zarr_version, meta_array)
    167     self._metadata_key_suffix = self._hierarchy_metadata["metadata_key_suffix"]
    169 # initialize metadata
--> 170 self._load_metadata()
    172 # initialize attributes
    173 akey = _prefix_to_attrs_key(self._store, self._key_prefix)

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/core.py:193, in Array._load_metadata(self)
    191 """(Re)load metadata from store."""
    192 if self._synchronizer is None:
--> 193     self._load_metadata_nosync()
    194 else:
    195     mkey = _prefix_to_array_key(self._store, self._key_prefix)

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/core.py:207, in Array._load_metadata_nosync(self)
    204     raise ArrayNotFoundError(self._path) from e
    205 else:
    206     # decode and store metadata as instance members
--> 207     meta = self._store._metadata_class.decode_array_metadata(meta_bytes)
    208     self._meta = meta
    209     self._shape = meta["shape"]

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/zarr/meta.py:141, in Metadata2.decode_array_metadata(cls, s)
    139         meta["dimension_separator"] = dimension_separator
    140 except Exception as e:
--> 141     raise MetadataError("error decoding metadata") from e
    142 else:
    143     return meta

MetadataError: error decoding metadata

At first I assumed there was something wrong with our handling of the loaded cftime_variables, but actually even if I drop the 'Time' variable I still get exactly the same error:

vds = open_virtual_dataset(
    's3://wrf-se-ak-ar5/ccsm/rcp85/daily/2060/WRFDS_2060-01-01.nc',
    indexes={},
    drop_variables=['Time'],
)

I don't know why it's even trying to convert anything to a datetime - none of the other variables have units of time.

What's also weird is that this is raised from within meta.py:260, in Metadata2.decode_fill_value(cls, v, dtype, object_codec), which suggests a problem with the fill_value. But I checked and all of the variables in this virtual dataset have a fill_value of either a float or nan in their .encoding, again nothing about a datetime.

@TomNicholas TomNicholas added bug Something isn't working references generation Reading byte ranges from archival files usage example Real world use case examples CF conventions labels Jul 25, 2024
@TomNicholas
Copy link
Member Author

@jsignell summoning you in case you have any thoughts / ideas here

@TomNicholas
Copy link
Member Author

@thodson-usgs got a similar looking error in #203 (comment), but only on more recent versions of virtualizarr. There must be some kind of regression, which we should narrow down using git bisect.

@jsignell
Copy link
Contributor

I am taking a look. Are you sure you got the same error when you dropped the time component? I am seeing an s3 access issue when I do that (which I am taking to mean I made it passed the original error).

from virtualizarr import open_virtual_dataset

vds = open_virtual_dataset(
    's3://wrf-se-ak-ar5/ccsm/rcp85/daily/2060/WRFDS_2060-01-01.nc',
    indexes={},
    drop_variables=["Time"]
)

vds.virtualize.to_kerchunk("combined_no_t.json", format="json")
ds = xr.open_dataset('combined_no_t.json', engine="kerchunk")
Show more output
---------------------------------------------------------------------------
NoCredentialsError                        Traceback (most recent call last)
File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/fsspec/asyn.py:245, in _run_coros_in_chunks.<locals>._run_coro(coro, i)
    244 try:
--> 245     return await asyncio.wait_for(coro, timeout=timeout), i
    246 except Exception as e:

File ~/micromamba/envs/virtualizarr/lib/python3.12/asyncio/tasks.py:520, in wait_for(fut, timeout)
    519 async with timeouts.timeout(timeout):
--> 520     return await fut

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:1125, in S3FileSystem._cat_file(self, path, version_id, start, end)
   1123         resp["Body"].close()
-> 1125 return await _error_wrapper(_call_and_read, retries=self.retries)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:142, in _error_wrapper(func, args, kwargs, retries)
    141 err = translate_boto_error(err)
--> 142 raise err

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:113, in _error_wrapper(func, args, kwargs, retries)
    112 try:
--> 113     return await func(*args, **kwargs)
    114 except S3_RETRYABLE_ERRORS as e:

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:1112, in S3FileSystem._cat_file.<locals>._call_and_read()
   1111 async def _call_and_read():
-> 1112     resp = await self._call_s3(
   1113         "get_object",
   1114         Bucket=bucket,
   1115         Key=key,
   1116         **version_id_kw(version_id or vers),
   1117         **head,
   1118         **self.req_kw,
   1119     )
   1120     try:

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:362, in S3FileSystem._call_s3(self, method, *akwarglist, **kwargs)
    361 additional_kwargs = self._get_s3_method_kwargs(method, *akwarglist, **kwargs)
--> 362 return await _error_wrapper(
    363     method, kwargs=additional_kwargs, retries=self.retries
    364 )

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:142, in _error_wrapper(func, args, kwargs, retries)
    141 err = translate_boto_error(err)
--> 142 raise err

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/s3fs/core.py:113, in _error_wrapper(func, args, kwargs, retries)
    112 try:
--> 113     return await func(*args, **kwargs)
    114 except S3_RETRYABLE_ERRORS as e:

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/client.py:388, in AioBaseClient._make_api_call(self, operation_name, api_params)
    387     apply_request_checksum(request_dict)
--> 388     http, parsed_response = await self._make_request(
    389         operation_model, request_dict, request_context
    390     )
    392 await self.meta.events.emit(
    393     'after-call.{service_id}.{operation_name}'.format(
    394         service_id=service_id, operation_name=operation_name
   (...)
    399     context=request_context,
    400 )

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/client.py:416, in AioBaseClient._make_request(self, operation_model, request_dict, request_context)
    415 try:
--> 416     return await self._endpoint.make_request(
    417         operation_model, request_dict
    418     )
    419 except Exception as e:

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/endpoint.py:98, in AioEndpoint._send_request(self, request_dict, operation_model)
     97 self._update_retries_context(context, attempts)
---> 98 request = await self.create_request(request_dict, operation_model)
     99 success_response, exception = await self._get_response(
    100     request, operation_model, context
    101 )

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/endpoint.py:86, in AioEndpoint.create_request(self, params, operation_model)
     83     event_name = 'request-created.{service_id}.{op_name}'.format(
     84         service_id=service_id, op_name=operation_model.name
     85     )
---> 86     await self._event_emitter.emit(
     87         event_name,
     88         request=request,
     89         operation_name=operation_model.name,
     90     )
     91 prepared_request = self.prepare_request(request)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/hooks.py:66, in AioHierarchicalEmitter._emit(self, event_name, kwargs, stop_on_response)
     65 # Await the handler if its a coroutine.
---> 66 response = await resolve_awaitable(handler(**kwargs))
     67 responses.append((handler, response))

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/_helpers.py:15, in resolve_awaitable(obj)
     14 if inspect.isawaitable(obj):
---> 15     return await obj
     17 return obj

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/signers.py:24, in AioRequestSigner.handler(self, operation_name, request, **kwargs)
     19 async def handler(self, operation_name=None, request=None, **kwargs):
     20     # This is typically hooked up to the "request-created" event
     21     # from a client's event emitter.  When a new request is created
     22     # this method is invoked to sign the request.
     23     # Don't call this method directly.
---> 24     return await self.sign(operation_name, request)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/aiobotocore/signers.py:88, in AioRequestSigner.sign(self, operation_name, request, region_name, signing_type, expires_in, signing_name)
     86         raise e
---> 88 auth.add_auth(request)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/botocore/auth.py:418, in SigV4Auth.add_auth(self, request)
    417 if self.credentials is None:
--> 418     raise NoCredentialsError()
    419 datetime_now = datetime.datetime.utcnow()

NoCredentialsError: Unable to locate credentials

The above exception was the direct cause of the following exception:

ReferenceNotReachable                     Traceback (most recent call last)
Cell In[7], line 1
----> 1 ds = xr.open_dataset('combined_no_t.json', engine="kerchunk")

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/api.py:571, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    559 decoders = _resolve_decoders_kwargs(
    560     decode_cf,
    561     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    567     decode_coords=decode_coords,
    568 )
    570 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 571 backend_ds = backend.open_dataset(
    572     filename_or_obj,
    573     drop_variables=drop_variables,
    574     **decoders,
    575     **kwargs,
    576 )
    577 ds = _dataset_from_backend_dataset(
    578     backend_ds,
    579     filename_or_obj,
   (...)
    589     **kwargs,
    590 )
    591 return ds

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/kerchunk/xarray_backend.py:12, in KerchunkBackend.open_dataset(self, filename_or_obj, storage_options, open_dataset_options, **kw)
      8 def open_dataset(
      9     self, filename_or_obj, *, storage_options=None, open_dataset_options=None, **kw
     10 ):
     11     open_dataset_options = (open_dataset_options or {}) | kw
---> 12     ref_ds = open_reference_dataset(
     13         filename_or_obj,
     14         storage_options=storage_options,
     15         open_dataset_options=open_dataset_options,
     16     )
     17     return ref_ds

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/kerchunk/xarray_backend.py:46, in open_reference_dataset(filename_or_obj, storage_options, open_dataset_options)
     42     open_dataset_options = {}
     44 m = fsspec.get_mapper("reference://", fo=filename_or_obj, **storage_options)
---> 46 return xr.open_dataset(m, engine="zarr", consolidated=False, **open_dataset_options)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/api.py:571, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    559 decoders = _resolve_decoders_kwargs(
    560     decode_cf,
    561     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    567     decode_coords=decode_coords,
    568 )
    570 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 571 backend_ds = backend.open_dataset(
    572     filename_or_obj,
    573     drop_variables=drop_variables,
    574     **decoders,
    575     **kwargs,
    576 )
    577 ds = _dataset_from_backend_dataset(
    578     backend_ds,
    579     filename_or_obj,
   (...)
    589     **kwargs,
    590 )
    591 return ds

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/zarr.py:1182, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version, store, engine)
   1180 store_entrypoint = StoreBackendEntrypoint()
   1181 with close_on_error(store):
-> 1182     ds = store_entrypoint.open_dataset(
   1183         store,
   1184         mask_and_scale=mask_and_scale,
   1185         decode_times=decode_times,
   1186         concat_characters=concat_characters,
   1187         decode_coords=decode_coords,
   1188         drop_variables=drop_variables,
   1189         use_cftime=use_cftime,
   1190         decode_timedelta=decode_timedelta,
   1191     )
   1192 return ds

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/store.py:58, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
     44 encoding = filename_or_obj.get_encoding()
     46 vars, attrs, coord_names = conventions.decode_cf_variables(
     47     vars,
     48     attrs,
   (...)
     55     decode_timedelta=decode_timedelta,
     56 )
---> 58 ds = Dataset(vars, attrs=attrs)
     59 ds = ds.set_coords(coord_names.intersection(vars))
     60 ds.set_close(filename_or_obj.close)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/dataset.py:711, in Dataset.__init__(self, data_vars, coords, attrs)
    708 if isinstance(coords, Dataset):
    709     coords = coords._variables
--> 711 variables, coord_names, dims, indexes, _ = merge_data_and_coords(
    712     data_vars, coords
    713 )
    715 self._attrs = dict(attrs) if attrs else None
    716 self._close = None

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/dataset.py:425, in merge_data_and_coords(data_vars, coords)
    421     coords = create_coords_with_default_indexes(coords, data_vars)
    423 # exclude coords from alignment (all variables in a Coordinates object should
    424 # already be aligned together) and use coordinates' indexes to align data_vars
--> 425 return merge_core(
    426     [data_vars, coords],
    427     compat="broadcast_equals",
    428     join="outer",
    429     explicit_coords=tuple(coords),
    430     indexes=coords.xindexes,
    431     priority_arg=1,
    432     skip_align_args=[1],
    433 )

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/merge.py:699, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args)
    696 for pos, obj in skip_align_objs:
    697     aligned.insert(pos, obj)
--> 699 collected = collect_variables_and_indexes(aligned, indexes=indexes)
    700 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
    701 variables, out_indexes = merge_collected(
    702     collected, prioritized, compat=compat, combine_attrs=combine_attrs
    703 )

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/merge.py:362, in collect_variables_and_indexes(list_of_mappings, indexes)
    360     append(name, variable, indexes[name])
    361 elif variable.dims == (name,):
--> 362     idx, idx_vars = create_default_index_implicit(variable)
    363     append_all(idx_vars, {k: idx for k in idx_vars})
    364 else:

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:1404, in create_default_index_implicit(dim_variable, all_variables)
   1402 else:
   1403     dim_var = {name: dim_variable}
-> 1404     index = PandasIndex.from_variables(dim_var, options={})
   1405     index_vars = index.create_variables(dim_var)
   1407 return index, index_vars

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:654, in PandasIndex.from_variables(cls, variables, options)
    651     if level is not None:
    652         data = var._data.array.get_level_values(level)
--> 654 obj = cls(data, dim, coord_dtype=var.dtype)
    655 assert not isinstance(obj.index, pd.MultiIndex)
    656 # Rename safely
    657 # make a shallow copy: cheap and because the index name may be updated
    658 # here or in other constructors (cannot use pd.Index.rename as this
    659 # constructor is also called from PandasMultiIndex)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:589, in PandasIndex.__init__(self, array, dim, coord_dtype, fastpath)
    587     index = array
    588 else:
--> 589     index = safe_cast_to_index(array)
    591 if index.name is None:
    592     # make a shallow copy: cheap and because the index name may be updated
    593     # here or in other constructors (cannot use pd.Index.rename as this
    594     # constructor is also called from PandasMultiIndex)
    595     index = index.copy()

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexes.py:469, in safe_cast_to_index(array)
    459             emit_user_level_warning(
    460                 (
    461                     "`pandas.Index` does not support the `float16` dtype."
   (...)
    465                 category=DeprecationWarning,
    466             )
    467             kwargs["dtype"] = "float64"
--> 469     index = pd.Index(np.asarray(array), **kwargs)
    471 return _maybe_cast_to_cftimeindex(index)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexing.py:509, in ExplicitlyIndexed.__array__(self, dtype)
    507 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray:
    508     # Leave casting to an array up to the underlying array type.
--> 509     return np.asarray(self.get_duck_array(), dtype=dtype)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/common.py:181, in BackendArray.get_duck_array(self, dtype)
    179 def get_duck_array(self, dtype: np.typing.DTypeLike = None):
    180     key = indexing.BasicIndexer((slice(None),) * self.ndim)
--> 181     return self[key]

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/zarr.py:104, in ZarrArrayWrapper.__getitem__(self, key)
    102 elif isinstance(key, indexing.OuterIndexer):
    103     method = self._oindex
--> 104 return indexing.explicit_indexing_adapter(
    105     key, array.shape, indexing.IndexingSupport.VECTORIZED, method
    106 )

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/core/indexing.py:1014, in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    992 """Support explicit indexing by delegating to a raw indexing method.
    993 
    994 Outer and/or vectorized indexers are supported by indexing a second time
   (...)
   1011 Indexing result, in the form of a duck numpy-array.
   1012 """
   1013 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1014 result = raw_indexing_method(raw_key.tuple)
   1015 if numpy_indices.tuple:
   1016     # index the loaded np.ndarray
   1017     indexable = NumpyIndexingAdapter(result)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/xarray/backends/zarr.py:94, in ZarrArrayWrapper._getitem(self, key)
     93 def _getitem(self, key):
---> 94     return self._array[key]

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:800, in Array.__getitem__(self, selection)
    798     result = self.get_orthogonal_selection(pure_selection, fields=fields)
    799 else:
--> 800     result = self.get_basic_selection(pure_selection, fields=fields)
    801 return result

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:926, in Array.get_basic_selection(self, selection, out, fields)
    924     return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
    925 else:
--> 926     return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:968, in Array._get_basic_selection_nd(self, selection, out, fields)
    962 def _get_basic_selection_nd(self, selection, out=None, fields=None):
    963     # implementation of basic selection for array with at least one dimension
    964 
    965     # setup indexer
    966     indexer = BasicIndexer(selection, self)
--> 968     return self._get_selection(indexer=indexer, out=out, fields=fields)

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:1343, in Array._get_selection(self, indexer, out, fields)
   1340 if math.prod(out_shape) > 0:
   1341     # allow storage to get multiple items at once
   1342     lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1343     self._chunk_getitems(
   1344         lchunk_coords,
   1345         lchunk_selection,
   1346         out,
   1347         lout_selection,
   1348         drop_axes=indexer.drop_axes,
   1349         fields=fields,
   1350     )
   1351 if out.shape:
   1352     return out

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/core.py:2177, in Array._chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
   2175     if not isinstance(self._meta_array, np.ndarray):
   2176         contexts = ConstantMap(ckeys, constant=Context(meta_array=self._meta_array))
-> 2177     cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)
   2179 for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   2180     if ckey in cdatas:

File ~/micromamba/envs/virtualizarr/lib/python3.12/site-packages/zarr/storage.py:1435, in FSStore.getitems(self, keys, contexts)
   1432     continue
   1433 elif isinstance(v, Exception):
   1434     # Raise any other exception
-> 1435     raise v
   1436 else:
   1437     # The function calling this method may not recognize the transformed
   1438     # keys, so we send the values returned by self.map.getitems back into
   1439     # the original key space.
   1440     results[keys_transformed[k]] = v

@thodson-usgs
Copy link
Contributor

btw, git bisect led me to 10bd53d. Maybe I can find the pre-squash branch and dig further tomorrow.

@thodson-usgs
Copy link
Contributor

thodson-usgs commented Jul 31, 2024

Here's the bug:

fill_value: FillValueT = Field(default=0.0, validate_default=True)

Reverting this line back to

fill_value: float | int | None = np.nan # float or int?

causes my test to pass.

I propose changing this to

fill_value: FillValueT = Field(default=np.nan, validate_default=True)

which also passes.

@TomAugspurger
Copy link
Contributor

AFAICT, 0.0 is the appropriate default fill value. That matches what zarr-python does. The line raising an exception is I think something like

In [28]: np.array([0.0], dtype=np.dtype("datetime64[ns]"))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[28], line 1
----> 1 np.array([0.0], dtype=np.dtype("datetime64[ns]"))

Called via

zarr.v2.meta.Metadata2.decode_fill_value(np.nan, np.dtype("datetime64[ns]"))

But that line fails with a fill value of np.nan and 0.0. @thodson-usgs would you be to get a debugger in there and see what the values of flil_value and dtype are both before and after 10bd53d? Or share a file somewhere public so I can take a look?

@thodson-usgs
Copy link
Contributor

thodson-usgs commented Jul 31, 2024

Thanks @TomAugspurger, I put an example back on #206. These might indeed be the same issue, but I want to be careful about crossing streams here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CF conventions references generation Reading byte ranges from archival files usage example Real world use case examples
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants