Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

Remove no-longer-included files #5

Open
aappling-usgs opened this issue Jan 19, 2021 · 4 comments
Open

Remove no-longer-included files #5

aappling-usgs opened this issue Jan 19, 2021 · 4 comments

Comments

@aappling-usgs
Copy link
Member

aappling-usgs commented Jan 19, 2021

sb_replace_files() doesn't delete omitted files, which means that if you change the file-posting command in remake.yml to omit a file that you've posted in the past, or if you rename a file, the old file/name won't be removed from ScienceBase. Could this slim-data-release repo provide file removal in such circumstances?

I discovered this by accident, when I pushed files from a new item-in-progress before repointing to a different SB item than what's provided by the template. The result at https://www.sciencebase.gov/catalog/item/5faaac68d34eb413d5df1f22 was that I overwrote fgdc_metadata.xml and added a new file, res_polygons.zip. I assume it's OK that I've corrupted the example data release...but I think we could see problems in real data releases using this template with old files never getting removed. Specifically, executing the code in my new repo didn't delete the old files spatial.zip and cars.csv, even though those were not included in my file list.

  log/sb_posted_files.csv:
    command: sb_replace_files(filename = target_name, 
      sb_id = I('5faaac68d34eb413d5df1f22'),
      "out_data/res_polygons.zip",
      "out_xml/fgdc_metadata.xml",
      sources = "src/sb_utils.R")

image

@aappling-usgs
Copy link
Member Author

For now, in my own release, I've added this reminder just above that recipe for `log/sb_posted_files.csv':

  # IMPORTANT: If you remove or rename a file on this list,
  # you must manually delete the old file from the SB item

@jordansread
Copy link
Member

Since the slim version here has just a single log file for the posted files, it would be pretty easy to verify that there aren't additional files in the item that aren't in the log file. We already check for duplicates here, and could also verify there aren't extras. This pattern won't work for more complicated release patterns where there are more than one machine building and pushing files...but I think the point of this slim version is that we could be constrained to a single machine and a single SB item.

@jordansread
Copy link
Member

Would be nice to warn instead of error and have a way to ignore certain files, since we may need to use the manual "large file uploader" in some cases.

@lindsayplatt
Copy link
Collaborator

The other piece that complicates this at the we started to switch to a pattern where the XML files were pushed to an item separately from the data files (see mntoha-data-release). This was in response to the multi-machine build pattern where we wanted people who didn't build the data files to still be able to edit and push the metadata. In these instances, a single SB item has multiple log files and we would need to handle that use case when deciding how to know what files are no longer needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants