Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider copying .git into Docker container to facilitate precise tracking of code used to generate outputs in manifest #138

Closed
cjyetman opened this issue Feb 16, 2024 · 6 comments · Fixed by #155

Comments

@cjyetman
Copy link
Member

cjyetman commented Feb 16, 2024

I think by not including the .git directory in #129, the manifest has now lost the functionality of tracking precisely what version of the run script was used, and even if any uncommitted local changes had been made (e.g. to the config file or the script itself) when running in a Docker container.

We now see something like this...

  "git": {
    "local_git_tag": "not a git repo",
    "local_git_hash": "not a git repo",
    "local_git_status": "not a git repo"
  },

as opposed to what we used to see and/or what we can see if running the "desktop"/local-RStudio way, like this...

  "git": {
    "local_git_tag": "no git tags found",
    "local_git_hash": "f97e472a5d03c8c1c95ebfa3c51b7e2534fc892e",
    "local_git_status": [
      {
        "status": "",
        "filename": "M config.yml"
      }
    ]
  },

@AlexAxthelm or @jdhoffa do you have opinions to the pros-cons of including the .git directory when copying over to the Docker container? I'm in inclined to start copying it in again because I think this git status information is very useful for tracking how the data.prep outputs were generated (I don't think there is any other indication of, for instance, what version of run_pacta_data_preparation.R was used). Or any other ideas on how to capture that info in the Docker container?

@AlexAxthelm
Copy link
Collaborator

If we copy just the .git/HEAD file, and /git/refs/heads directory, then we can get the current commit (but not clean/dirty status) without bringing in the entire history.

HEAD is a oneline file that has the name of the ref, /refs/heads/* are files with the commit SHAs of the HEAD of each ref.

So from my current copy of this repo, on main (which happens to be at f14e5af as of right now):

sh-3.2$ pwd
/Users/aaxthelm/Documents/pacta/workflow.data.preparation

sh-3.2$ git rev-parse HEAD
f14e5af7c8b8e03fc7c8739dc1f1cc09fe3dc95c

sh-3.2$ git branch
  azure-vm
  feature/azure-container
  feature/logger
* main
  remove-fact-set-database-code

sh-3.2$ cat .git/HEAD
ref: refs/heads/main

sh-3.2$ cat .git/refs/heads/main
f14e5af7c8b8e03fc7c8739dc1f1cc09fe3dc95c

@AlexAxthelm
Copy link
Collaborator

I've found a way to preserve dirty/clean using git clone --no-checkout --depth=1 without bringing the whole history along, but it's a bit more complicated (and I haven't tried doing it in a docker container, but it seems very doable)

From repo root:

sh-3.2$ pwd
/Users/aaxthelm/Documents/pacta/workflow.data.preparation

sh-3.2$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

sh-3.2$ git rev-parse HEAD
f14e5af7c8b8e03fc7c8739dc1f1cc09fe3dc95c

sh-3.2$ git clone --no-checkout --depth=1 file://$(pwd) ~/Downloads/git-test
Cloning into '/Users/aaxthelm/Downloads/git-test'...
remote: Enumerating objects: 17, done.
remote: Counting objects: 100% (17/17), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 17 (delta 0), reused 12 (delta 0), pack-reused 0
Receiving objects: 100% (17/17), 14.94 KiB | 14.94 MiB/s, done.

# note that I'm only copying some of the tracked files
sh-3.2$ cp {DESCRIPTION,LICENSE,LICENSE.md,README.md,config.yml,run_pacta_data_preparation.R} ~/Downloads/git-test/

# See size of git dir
sh-3.2$ du -sh .git
1.1M    .git

Then from the destination directory:

sh-3.2$ pwd
/Users/aaxthelm/Downloads/git-test

sh-3.2$ git rev-parse HEAD
f14e5af7c8b8e03fc7c8739dc1f1cc09fe3dc95c

sh-3.2$ git add .

sh-3.2$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        deleted:    .dockerignore
        deleted:    .github/CODEOWNERS
        deleted:    .github/workflows/add-prs-and-issues-to-project.yml
        deleted:    .gitignore
        deleted:    Dockerfile
        deleted:    docker-compose.yml
        deleted:    workflow.data.preparation.Rproj

sh-3.2$ du -sh .git
128K    .git

@AlexAxthelm
Copy link
Collaborator

AlexAxthelm commented Feb 16, 2024

For this repo, it's probably not that big of a deal, and I don't have strong opinions one way or the next, other than a general "yes, having the git info accessible in the manifest is nice". But for bigger repos (with long histories), it would be good to try to not put everything in the docker image.

@jdhoffa
Copy link
Member

jdhoffa commented Feb 16, 2024

Agreed, if copying in .git for a specific purpose, it would be good to precisely copy in what you need.

@cjyetman
Copy link
Member Author

thanks! I like the idea of only copying in the .git/HEAD file and /git/refs/heads refs directory. I'll investigate if I can achieve something similar for git status

@AlexAxthelm
Copy link
Collaborator

@cjyetman out of curiousity, can you add !.git to .dockerignore and do a build what does the local_git_status` of the manifest look like? I'd imagine it's probably pretty noisy with the tracked files that aren't copied in (like in my second example above)

AlexAxthelm added a commit that referenced this issue Feb 18, 2024
We may want to have a more elegant solution in the future (see
discussion in #138), but this works for now to include git status in the
manifest export.

Closes: #138
cjyetman pushed a commit that referenced this issue Feb 22, 2024
We may want to have a more elegant solution in the future (see
discussion in #138), but this works for now to include git status in the
manifest export.

Closes: #138
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants