Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make cleanup parameter for storing/deleting downloaded files for _get_spec functions #285

Closed
jkrick opened this issue Jun 13, 2024 · 2 comments · Fixed by #347
Closed

make cleanup parameter for storing/deleting downloaded files for _get_spec functions #285

jkrick opened this issue Jun 13, 2024 · 2 comments · Fixed by #347
Labels
use case: spectroscopy Spectroscopy use case

Comments

@jkrick
Copy link
Contributor

jkrick commented Jun 13, 2024

from conversation in PR #281

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

Brigitta: I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.
astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.

Andreas: I agree with that.
Somehow limit the disk space and give the user a warning when space is tight? We can also write a clean-up function that clears all the temporary files at some point.

Jessica: I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.

Troy: Since this is a Fornax notebook I think we have to make it usable on the Fonax Console, which means respecting the 10G user disk space.

Has anyone looked to see if there are ways to avoid actually downloading anything(s)?
We should warn the user upfront how much space will be needed.
Does anyone have a sense of how much disk space the full notebook currently requires?
We can also write a clean-up function that clears all the temporary files at some point.
If the full notebook needs less than 5G(?) disk space, then my vote is to write this function and make it an optional thing, so it's available for the to user run or not as they wish. (Choosing 5G to leave space for other things.)

I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.
Do you know how much disk space would be needed for all the Herschel stuff you want to download?

So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.
I agree that's a good thing to push for. I don't think we should count on that to come in time for this notebook to rely on it.

@jkrick jkrick added the use case: spectroscopy Spectroscopy use case label Jun 13, 2024
@jkrick
Copy link
Contributor Author

jkrick commented Sep 18, 2024

coming back to this....
There is no way to avoid downloading the Herschel data, but at the moment we are not running the Herschel module because it is too time intensive.

I like the idea of a cleanup parameter for each of the _get_spec functions which is currently turned on, as in files are deleted, but the user would have the choice to switch it off if they want to keep the data around. That keeps directories in a clean state unless specifically chosen otherwise.

@jkrick jkrick changed the title make plan for storing/deleting downloaded files for spectroscopy notebook make cleanup parameter for storing/deleting downloaded files for _get_spec functions Sep 18, 2024
@jkrick
Copy link
Contributor Author

jkrick commented Sep 25, 2024

Functions which might need this cleanup function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case: spectroscopy Spectroscopy use case
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant