Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4-d Variable Output Using Standard Memory Mode #24

Open
anewman89 opened this issue Nov 19, 2015 · 4 comments
Open

4-d Variable Output Using Standard Memory Mode #24

anewman89 opened this issue Nov 19, 2015 · 4 comments
Labels

Comments

@anewman89
Copy link

I used the "standard" option and ran into an issue. If I tried to include soil moisture (4-d variable) in the configuration file I got the following error when the code tries to write to the netcdf files after it loads all the files in the current chunk:

Traceback (most recent call last):
  File "/glade/u/home/anewman/bin/vic_utils", line 5, in <module>
    pkg_resources.run_script('tonic==0.0.0.dev-2bf5167', 'vic_utils')
  File "/glade/apps/opt/python/2.7.7/gnu-westmere/4.8.2/lib/python2.7/site-packages/pkg_resources.py", line 534, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/glade/apps/opt/python/2.7.7/gnu-westmere/4.8.2/lib/python2.7/site-packages/pkg_resources.py", line 1441, in run_script
    exec(script_code, namespace, namespace)
  File "/glade/u/home/anewman/lib/python2.7/site-packages/tonic-0.0.0.dev_2bf5167-py2.7.egg/EGG-INFO/scripts/vic_utils", line 221, in <module>

  File "/glade/u/home/anewman/lib/python2.7/site-packages/tonic-0.0.0.dev_2bf5167-py2.7.egg/EGG-INFO/scripts/vic_utils", line 197, in main

  File "build/bdist.linux-x86_64/egg/tonic/models/vic/vic2netcdf.py", line 546, in _run
  File "build/bdist.linux-x86_64/egg/tonic/models/vic/vic2netcdf.py", line 896, in vic2nc
  File "build/bdist.linux-x86_64/egg/tonic/models/vic/vic2netcdf.py", line 459, in nc_add_data_standard
  File "netCDF4.pyx", line 3267, in netCDF4.Variable.__setitem__ (netCDF4.c:39658)
ValueError: total size of new array must be unchanged

I traced it back to line ~448:

self.f.variables[name][:, i, ys, xs] 

is looking for something that is 2-dimensional while

p.df[sn].values[self.slice]

is only 1 dimensional with a length set at the number of time steps going into the current netcdf file. If I removed soil moisture in the configuration file, I got this option to output properly, so it was an issue with 4-d variables.

I then made some modifications to the code and got it to work for 4-d variables. This is really my first halfway serious go with python, so my syntactical understanding is limited, lots of potential for me to have messed the fix up in some fashion.

I ran the code a bunch and it worked fine. It seemed a little slow, but there is lots of I/O both in and out so I didn't think much of it. Then I got an email from our supercomputer system administration folks stating that my code was performing an excessive amount of disk writes to the same location. They reported that the read rates were fine, but the output was many times the input. That makes me think I fixed the code in an improper fashion so the netcdf writes are occurring an excessive number of times...

The changes are in the function: nc_add_data_standard. What is the best way for me to post my "fixed" code?

Cheers,
Andy

@jhamman jhamman added the bug label Nov 19, 2015
@jhamman
Copy link
Member

jhamman commented Nov 19, 2015

Sounds like we have two issues here:

  1. Standard mode slice bug: I think we can fix this for the 4-d var. It sounds like a pretty simple fix that you may have already applied. This would be worth issuing a pull request against develop for.
  2. Your sys admin is right, the standard mode makes a lot of writes. You could try the big_memory mode or the original mode and see if that helps. big_memory will be the fastest but, as you may glean from its name, it uses the most memory. This mode reads and writes each file only once.

@anewman89
Copy link
Author

Hi Joe, I've pushed to my fork. It looks like I edited both the 3-d and 4-d output for the nc_add_data_standard function. I can go ahead and issue the pull request.

@anewman89
Copy link
Author

On point 2: Right, the standard mode would write after each chunk is read in. Does it work like this:

  1. Define the netcdf file with the full grid dimensions
  2. Issue write commands to fill the portions of the grid as they are read in. Something like netcdf_put_vara_* would be used for each variable write.

I would think the total data writes would still be roughly equal to the total data read... I was getting something on the order of 10x data being written than read.

@jhamman
Copy link
Member

jhamman commented Nov 19, 2015

I would think the total data writes would still be roughly equal to the total data read... I was getting something on the order of 10x data being written than read.

It probably depends how you chunk you dataset up.

Issue write commands to fill the portions of the grid as they are read in. Something like netcdf_put_vara_* would be used for each variable write.

yes, but the Python API doesn't use that syntax exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants