Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always restore cache #33

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Always restore cache #33

wants to merge 7 commits into from

Conversation

aminya
Copy link
Member

@aminya aminya commented Jul 10, 2020

Edit: turns out npm ci deletes all the restored cache. This PR means we should decide between the reliability of CI vs the time it takes to bootstrap

We should restore the node_modules cache even if we change package.json. npm install is smart enough to not redo everything from scratch! In this case, however, we should not skip the bootstrap phase completely. It will automatically become faster due to npm's smartness.

This branch is based on #31. Needs rebase and adding it for other platforms for final merging.

@aminya aminya mentioned this pull request Jul 10, 2020
@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

Okay, we can try this.

Better to use fallbacks though. Hits the more specific match if it can, then falls back to a more generic/loose match if possible.

Keep everything in the key: field.

Add restoreKeys: for the more generic/looser matches.

https://docs.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#restore-keys

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

Something like this:

          key: 'npm | "$(Agent.OS)" | "$(buildArch)" | apm/package.json, apm/package-lock.json, script/vsts/platforms/windows.yml'
          restoreKeys: |
             npm | "$(Agent.OS)" | "$(buildArch)" | apm/package.json, apm/package-lock.json
             npm | "$(Agent.OS)" | "$(buildArch)"

Oh and apparently we can cache the "system global npm package cache" somehow, not just the installed node_modules.

@aminya aminya force-pushed the always_restore_cache branch 2 times, most recently from 3345ae7 to 6413b15 Compare July 10, 2020 04:29
@aminya
Copy link
Member Author

aminya commented Jul 10, 2020

Something like this:

          key: 'npm | "$(Agent.OS)" | "$(buildArch)" | apm/package.json, apm/package-lock.json, script/vsts/platforms/windows.yml'
          restoreKeys: |
             npm | "$(Agent.OS)" | "$(buildArch)" | apm/package.json, apm/package-lock.json
             npm | "$(Agent.OS)" | "$(buildArch)"

Oh and apparently we can cache the "system global npm package cache" somehow, not just the installed node_modules.

Done in b766569

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

Oh and apparently we can cache the "system global npm package cache" somehow, not just the installed node_modules.

https://stackoverflow.com/questions/14836053/how-can-i-change-the-cache-path-for-npm-or-completely-disable-the-cache-on-win

npm config set cache [some_dir_in_the_working_directory_of_the_pipeline]

If we set this we can cache one more part of CI to run strictly from Azure servers. (Avoid network calls to npmjs.com).

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

Done in b766569

Nice

@aminya
Copy link
Member Author

aminya commented Jul 10, 2020

Oh and apparently we can cache the "system global npm package cache" somehow, not just the installed node_modules.

https://stackoverflow.com/questions/14836053/how-can-i-change-the-cache-path-for-npm-or-completely-disable-the-cache-on-win

npm config set cache [some_dir_in_the_working_directory_of_the_pipeline]

If we set this we can cache one more part of CI to run strictly from Azure servers. (Avoid network calls to npmjs.com).

Where do we need the global npm cache? It is better to first fix the current caching system before trying new folders.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

Where do we need the global npm cache?

https://docs.microsoft.com/en-us/azure/devops/pipelines/build/variables?view=azure-devops&tabs=yaml#agent-variables

$(Agent.BuildDirectory)/npmcache?

It is better to first fix the current caching system before trying new folders.

👍 Agreed.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

We still have cache misses

Maybe the new runs will upload their caches as the strict identifier AND the loose identifiers. So we might have plenty of chances to hit cache going forward. I'd like to see if that's what happens at the cache save for the current run.

By the way, we should make sure it still bootstraps after an inexact cache hit, right?

I think we are already set to do that, because cacheHitVar will be set to inexact, not true. (Bootstrap only skips if all three cache hit vars resolve to true).

@aminya
Copy link
Member Author

aminya commented Jul 10, 2020

I removed bootstrap skipping in 6413b15. If we want to use cache when we change package.json, etc, we need to remove this line.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

If we get an exact cache restore for all three, then we do want to skip bootstrapping IMO.

It would be a plain win to revert that, IMO. Bootstrap skipping is nice.

It only skips when all three are perfectly exactly matched. I was thinking ahead to inexact hits/restores I guess, but never followed up. You are on a roll with these by the way.

We are going to give upstream a bit of a heart attack if we try to post these to them though, lol. No one likes an outside contributor saying "Hey, I can contribute a change to the thing that lets you know if you broke your app!"

And even for our own sake we should watch this closely that it doesn't, well, break our CI to do inexact cache restores and re-bootstrap on top of that.

Anyway, if it interests us, no reason we can't do it. Upstream and us don't have to say exactly close. Just close enough that we can track and integrate on top of their work. And hopefully upstream stuff to benefit the widest community possible. (And hey, upstreaming helps to ensure they don't break our stuff, lol. So we make it "our [collective] stuff" and help them out. Open source is beautiful.)

@aminya aminya changed the base branch from master to windows_tests July 10, 2020 04:54
@aminya aminya force-pushed the always_restore_cache branch from 6413b15 to b766569 Compare July 10, 2020 04:56
@aminya
Copy link
Member Author

aminya commented Jul 10, 2020

If we get an exact cache restore for all three, then we do want to skip bootstrapping IMO.

It would be a plain win to revert that, IMO. Bootstrap skipping is nice.

It only skips when all three are perfectly exactly matched.

OK. I reverted that. I changed the target branch for simpler diff.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

Maybe we can cache Windows build tools and dependencies... I'm just afraid we're going to have nothing but 39 cache steps pretty soon.

@aminya
Copy link
Member Author

aminya commented Jul 10, 2020

Maybe we can cache Windows build tools and dependencies... I'm just afraid we're going to have nothing but 39 cache steps pretty soon.

Those don't take much time. I prefer caching for something that takes a lot of time. If we could write the cache steps in a more compact way, we could try that.

see #1 (comment)

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

I just realized we haven't run the PR pipeline on master in... a very long time. So there is no cache saved for master branch. Which drastically reduced the chance of a cache hit. I'm running the PR pipeline against master now.

I think we should manually run the PR Pipeline for master branch every time we merge something that touches a file in script/vsts/platforms/*.yml

Edit: That would be hard to remember, though. We may as well run it on a schedule. Once every day maybe.

@aminya
Copy link
Member Author

aminya commented Jul 10, 2020

@DeeDeeG We can set a pull request trigger for the master branch. That should be possible in Azure UI.

@aminya aminya added the CI label Jul 11, 2020
@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 11, 2020

@aminya Good idea! It should also be possible in the yaml if you prefer. Something like

pr:
- master

https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/github?view=azure-devops&tabs=yaml#pr-triggers

@aminya aminya force-pushed the windows_tests branch 2 times, most recently from 3781498 to 44183eb Compare July 11, 2020 03:10
@aminya aminya force-pushed the always_restore_cache branch from b766569 to 6641f92 Compare July 11, 2020 03:26
@aminya aminya force-pushed the windows_tests branch 2 times, most recently from bdf0219 to 82d8c06 Compare July 11, 2020 04:52
@aminya aminya force-pushed the always_restore_cache branch from 6641f92 to e1929b3 Compare July 11, 2020 04:55
@DeeDeeG
Copy link
Member

DeeDeeG commented Aug 22, 2020

I don't support this PR's approach. I think it's working alright as it is. Can we close this?

@aminya
Copy link
Member Author

aminya commented Aug 22, 2020

I don't support this PR's approach. I think it's working alright as it is. Can we close this?

Could you give us a reason for your opinion?

If I want to do this I will revert your commits that remove ci option and instead introduce an option like: USE_NPM_CI=false.

@DeeDeeG
Copy link
Member

DeeDeeG commented Aug 22, 2020

It casts a wider net for restoring stuff, including things that might be out of date for reasons we haven't foreseen or prepared for. Dealing with potentially more diverse inputs from more scenarios makes the job of handling it correctly harder, including handling weird edge cases.

"Trusting that npm knows what to do" isn't really a good idea. npm is a huge piece of software that has experienced an enormous amount of bugs in its time. Just read its changelogs.

Part of the tradeoff with CI is that you make it a simple, clean install, no matter what, and maybe you can optimize within that rule. I just feel like this violates the principle of CI, that you should never wonder of your CI caused a bug; It should be as clear as possible that CI did NOT cause the bug. And in case CI did cause a bug, you'd want it to be obvious. The failure modes if this doesn't work out would be maddeningly obscure.

I don't like that this adds complexity, both in terms of real exposure to more diverse behavior, and in terms of how one needs to mentally model the cache restoration if debugging it.

Atom is a very large app with some weird edge cases (custom npm wrapper (apm), a forked one at that, building against Electron rather than Node, also building against Node elsewhere in the bootstrap... running lots of stuff in subprocesses that kind of aren't meant to... If there is an edge case where npm breaks, I think this repo would find it. And it already has a few times when we've been working on things.

I am also not confident in our track record of properly waiting to vet things, when the things that can go wrong are obscure. There are a lot of edge cases I still haven't wrapped my mind around with this one, and I feel like we're walking into a trap of our own making on this.

Finally: Minimal benefit.

This can at most save "[time to fully bootstrap] - [time to restore whichever caches were hit]". Which is usually on the order of seven to fourteen minutes or so, I think. The amount of time to debug this, and the lack of peace of mind (did CI restore something that's now broken in this new context?), is not worth that. Once a PR has had a single passing build stage of any commit, it should be able to restore the bootstrap from that.

We certainly haven't collected the information to do a cost-benefit analysis. I can't list the pros clearly, nor the cons. We're just skipping part of the process where you finalize then re-review the final design, IMO. Once something is thrown together that passes CI, we post it to master, it's a process error anyhow, even if the tech might be sound, we often skip knowing that with high assurance until after we push to master.

@DeeDeeG
Copy link
Member

DeeDeeG commented Aug 22, 2020

P.S. by using npm install rather than npm ci, as we'de have to do here, there is the potential that npm installs different packages than appear in the lockfile. This is another avenue for changed behavior that has nothing to do with the actual diff of a PR.

@DeeDeeG
Copy link
Member

DeeDeeG commented Aug 22, 2020

I'd much prefer us push something like this PR over the finish line at upstream: atom#18808

@DeeDeeG DeeDeeG closed this Aug 22, 2020
@DeeDeeG DeeDeeG reopened this Aug 22, 2020
@DeeDeeG
Copy link
Member

DeeDeeG commented Aug 22, 2020

Sorry, the comment and close issue button are right next to each-other! Did not mean to close this.

@ghost
Copy link

ghost commented Jul 13, 2022

What's the news on this PR? It is inactive for some time....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants