Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcdf limits? #74

Open
anthoni-p opened this issue Aug 26, 2021 · 3 comments
Open

netcdf limits? #74

anthoni-p opened this issue Aug 26, 2021 · 3 comments
Labels

Comments

@anthoni-p
Copy link

anthoni-p commented Aug 26, 2021

getting an error when trying to read a netcdf with per PFT data, not sure where it comes from.

>   S3.gcp2021.src=defineSource("S3_GCP2021","S3_GCP2021"
+                               ,"/pd/data/lpj/GCP/GCP_2021/ftpUpload/LPJ-GUESS/S3",format=NetCDF
+                               ,contact="xxx",institute="xxx")
>   cvegpft.s3.21=getField(S3.gcp2021.src,"cVegpft",file.name = "LPJ-GUESS_S3_cVegpft.nc")

Error in (function (..., sorted = TRUE, unique = FALSE)  : 
  Cross product of elements provided to CJ() would result in 2163283200 rows which exceeds 
.Machine$integer.max == 2147483647

Can only parts of such a file get read in, e.g. PFT index 1, how can we specify that in a getfield?

here ncdump info of the files, yearly data, so not really that many time steps:
netcdf LPJ-GUESS_S3_cVegpft {
dimensions:
PFT = 26 ;
longitude = 720 ;
latitude = 360 ;
time = UNLIMITED ; // (321 currently)
variables:
int PFT(PFT) ;
PFT:long_name = "plant functional type" ;
double longitude(longitude) ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
double latitude(latitude) ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
double time(time) ;
time:units = "days since 1700-01-01" ;
time:long_name = "time" ;
time:calendar = "noleap" ;
float cVegpft(time, latitude, longitude, PFT) ;
cVegpft:units = "kg m-2" ;
cVegpft:_FillValue = -99999.f ;
cVegpft:long_name = "Vegtype level Carbon in Vegetation" ;

@MagicForrest
Copy link
Owner

MagicForrest commented Aug 26, 2021

Hi Peter,

It looks like TRENDY data is now just slightly over R's integer size limit. That is bad luck. Unfortunately I haven't implemented reading Fields per layer. I never quite had the reason, but now there is one.. It shouldn't be too hard but will take a bit of time.

Is this urgent for you? I am on holiday right now.

@MagicForrest
Copy link
Owner

Hi Peter,

I have implemented a "layer" argument to getField() which allows selection of individual variables inside the netCDF file. I think this is a useful thing. But I just realised that doesn't solve the problem you had above, because it doesn't allow selecting individual PFT indices.

So, I propose adding an additional argument, something like 'layer.dim.indices' which would allow subsetting on the PFT (or whatever) dimension. Will take a bit more time to implement however...

@MagicForrest
Copy link
Owner

@anthoni-p I guess this issue hasn't gone away? I never implemented the 'layer.dim.index' idea. I can see a way to solve this though, basically read the netCDF is separate slices for each element in the layer axis (in this case your 26 PFTs). That will keep the size below the maximum integers and keep the memory footprint down. It might even make the code faster, but I am not sure....

If you are still in interested, can you supply me with a test file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants