On the size of an xarray.DataArray loaded via a Parquet store vs. Raw Data Files #539
Unanswered
NikosAlexandris
asked this question in
Q&A
Replies: 1 comment
-
Further details, for a single NetCDF file : ❯ stat SISin200001010000004231000101MA_Italia_48_64_64_zlib_0.nc
File: .../SISin200001010000004231000101MA_Italia_48_64_64_zlib_0.nc
Size: 6328605 Blocks: 12368 IO Block: 4096 regular file
Device: 8,97 Inode: 8791662107 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ nik) Gid: ( 1000/autologin)
Access: 2025-01-20 23:08:49.843344130 +0100
Modify: 2025-01-20 23:08:29.186676971 +0100
Change: 2025-01-20 23:08:29.186676971 +0100
Birth: 2025-01-20 23:08:28.636676957 +0100 plus some details of its internal structure Variable Shape Chunks Cache Elements Preemption Type Scale Offset Compression Level Shuffling Read Time
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
lat 232 contiguous 1048576 521 0.75 float32 - - 0 False -
lon 238 contiguous 1048576 521 0.75 float32 - - 0 False -
record_status 48 48 16777216 1000 0.75 int8 - - 0 False -
lat_bnds 232 x 2 contiguous 1048576 521 0.75 float32 - - 0 False -
lon_bnds 238 x 2 contiguous 1048576 521 0.75 float32 - - 0 False -
SIS 48 x 232 x 238 48 x 64 x 64 16777216 1000 0.75 int16 - - 0 False 0.014
time 48 48 16777216 1000 0.75 float64 - - 0 False -
File size: 6328605 bytes, Dimensions: time: 48, lon: 238, bnds: 2, lat: 232
* Cache: Size in bytes, Number of elements, Preemption ranging in [0, 1] yet In [8]: one_netcdf_file = xarray.open_dataset('SISin200001010000004231000101MA_Italia_48_64_64_zlib_0.nc')
In [9]: one_netcdf_file.SIS
Out[9]:
<xarray.DataArray 'SIS' (time: 48, lat: 232, lon: 238)> Size: 11MB
[2650368 values with dtype=float32]
Coordinates:
* time (time) datetime64[ns] 384B 2000-01-01 ... 2000-01-01T23:30:00
* lon (lon) float32 952B 6.675 6.725 6.775 6.825 ... 18.42 18.48 18.52
* lat (lat) float32 928B 35.53 35.58 35.62 35.67 ... 46.97 47.03 47.08
Attributes:
standard_name: surface_downwelling_shortwave_flux_in_air
long_name: Surface Downwelling Shortwave Radiation
units: W m-2
cell_methods: time: point |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In comment #345 (comment) I reported the total size of thousands of single Parquet stores being 147,3 MB which is way smaller than the total size of one aggregate Parquet store 1,5 GB of all single ones !
@martindurant explains in #345 (comment)
Today I think to have a similar question (see also : Details below). I observe a difference in the reported size of an xarray.DataArray compared to the size of the 366 raw NetCDF files. Specifically :
.nc
files : 2.2 G.parquet
file : 44 ΚSIS
has dimensionstime: 17568, lat: 232, lon: 238
and is reported to be 4GB, containing 970,034,688 values withdtype=float32
..nc
files is 2.2GB.parquet
file which I use to load the data via Xarray is only 44KB.I am trying to understand :
inline_threshold=500
option ?Details
and
Beta Was this translation helpful? Give feedback.
All reactions