You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen@bsipocz what do you recommend?
Brigitta: I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.
astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.
Andreas: I agree with that.
Somehow limit the disk space and give the user a warning when space is tight? We can also write a clean-up function that clears all the temporary files at some point.
Jessica: I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.
Troy: Since this is a Fornax notebook I think we have to make it usable on the Fonax Console, which means respecting the 10G user disk space.
Has anyone looked to see if there are ways to avoid actually downloading anything(s)?
We should warn the user upfront how much space will be needed.
Does anyone have a sense of how much disk space the full notebook currently requires?
We can also write a clean-up function that clears all the temporary files at some point.
If the full notebook needs less than 5G(?) disk space, then my vote is to write this function and make it an optional thing, so it's available for the to user run or not as they wish. (Choosing 5G to leave space for other things.)
I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.
Do you know how much disk space would be needed for all the Herschel stuff you want to download?
So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.
I agree that's a good thing to push for. I don't think we should count on that to come in time for this notebook to rely on it.
The text was updated successfully, but these errors were encountered:
coming back to this....
There is no way to avoid downloading the Herschel data, but at the moment we are not running the Herschel module because it is too time intensive.
I like the idea of a cleanup parameter for each of the _get_spec functions which is currently turned on, as in files are deleted, but the user would have the choice to switch it off if they want to keep the data around. That keeps directories in a clean state unless specifically chosen otherwise.
jkrick
changed the title
make plan for storing/deleting downloaded files for spectroscopy notebook
make cleanup parameter for storing/deleting downloaded files for _get_spec functions
Sep 18, 2024
from conversation in PR #281
As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?
Brigitta: I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.
astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.
Andreas: I agree with that.
Somehow limit the disk space and give the user a warning when space is tight? We can also write a clean-up function that clears all the temporary files at some point.
Jessica: I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.
Troy: Since this is a Fornax notebook I think we have to make it usable on the Fonax Console, which means respecting the 10G user disk space.
Has anyone looked to see if there are ways to avoid actually downloading anything(s)?
We should warn the user upfront how much space will be needed.
Does anyone have a sense of how much disk space the full notebook currently requires?
We can also write a clean-up function that clears all the temporary files at some point.
If the full notebook needs less than 5G(?) disk space, then my vote is to write this function and make it an optional thing, so it's available for the to user run or not as they wish. (Choosing 5G to leave space for other things.)
I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.
Do you know how much disk space would be needed for all the Herschel stuff you want to download?
So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.
I agree that's a good thing to push for. I don't think we should count on that to come in time for this notebook to rely on it.
The text was updated successfully, but these errors were encountered: