Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container image size has ballooned #132

Closed
ian-noaa opened this issue Aug 5, 2023 · 2 comments · Fixed by #136
Closed

Container image size has ballooned #132

ian-noaa opened this issue Aug 5, 2023 · 2 comments · Fixed by #136
Assignees
Labels
component: build process Build process issue component: CI/CD Continuous integration and deployment issues type: bug Fix something that is not working

Comments

@ian-noaa
Copy link
Collaborator

ian-noaa commented Aug 5, 2023

Describe the Problem

Our METexpress containers have ballooned from ~700 MB to 3.8 GB. This has caused timeouts and disk space issues in CI.

Expected Behavior

Our images should build in ~15 minutes and be ~700-800 MB in size.

Additional Info

Molly noticed in 821ecdf that our CI was timing out. We increased the timeout and discovered that meteor was now taking 40-50 minutes to build the bundle and filling up the GitHub Action VM's disk (14 GB). Our container image size somehow has ballooned to ~3.8 GB since 7/22/23. Previously, they were weighing in around 700-800 MB. Investigation on 8/4/23 showed that the meteor bundle being produced by the following Dockerfile line:

RUN bash ${SCRIPTS_FOLDER}/build-meteor-bundle.sh

had grown from ~400 MB to 3.3 GB for some reason. It's present in the build layer at /opt/bundle and in the production image at /usr/app/bundle. We're unsure why this is occurring as MATS has the same dependencies and Dockerfile and is unaffected - it's seeing builds take ~15min and images of 700-800 MB.

I committed a temporary workaround in 4cfbdba that disables security scanning. Our security scanner attempts to pull the image back down. A better workaround would be to do the security scanning in yet another job so we had a fresh VM to work with. Ultimately though, we want our image size reduced.

@ian-noaa ian-noaa added component: build process Build process issue component: CI/CD Continuous integration and deployment issues type: bug Fix something that is not working labels Aug 5, 2023
@ian-noaa ian-noaa self-assigned this Aug 5, 2023
@ian-noaa ian-noaa changed the title Image size has ballooned causing CI timeouts Container image size has ballooned Aug 5, 2023
@ian-noaa ian-noaa linked a pull request Aug 23, 2023 that will close this issue
@ian-noaa
Copy link
Collaborator Author

The root cause of our image size increase is definitely related to Couchbase from MATScommon. In the MATS images Couchbase takes up ~51MB and took 10 seconds to install. In the METexpress images it takes ~3.1GB and took much longer. The difference was due to the METexpress image rebuilding the Couchbase dependency from source while on MATS for reasons that are unclear to me, it was able to grab precompiled binaries.

I tracked the issue back to the way Meteor interacts with NPM and particularly MATScommon/meteor_packages/mats-common/.npm/package/npm-shrinkwrap.json in MATScommon. It looks like that file was updated in one of the latest commits (https://github.com/dtcenter/MATScommon/blob/866cf7435aaf194acb92d57b9f92b1645406b7d0/meteor_packages/mats-common/.npm/package/npm-shrinkwrap.json) and now requires some MacOS x86 Couchbase binaries. On my ARM Mac, this will fail to build and so Meteor/Node attempt to rebuild the dependency from source. I'm assuming something similar happens on the Linux hosts in CI. Removing the specification for the particular MacOS versions of Couchbase fixes the issue.

However, I'm not sure how to prevent this from recurring in the future whenever someone commits updates to MATScommon. Meteor automatically creates that file and I'm not sure why the OS-specific dependencies are entering that file now. Currently, the best I can think of is asking people to double-check that file for any OS-specific requirements and to compare how they're generating the package-lock.json and npm-shrinkwrap.json files which I'm not super satisfied with. I also removed CMake from the docker image so that NPM will fail if it attempts to rebuild any dependencies. So hopefully that will help catch packages that slip through.

I'm also a bit puzzled that this didn't affect MATS as it should have been using the same version of MATScommon as METexpress.

ian-noaa added a commit that referenced this issue Sep 1, 2023
This PR does a few things:

* Remove `cmake` from our `Dockerfile` to prevent Couchbase from being
rebuilt by Meteor/NPM. If Couchbase is built from source it takes up a
large amount of space in the image.
* Removes some OS-specific Couchbase dependencies that made it into
MATScommon.
* Update the `.dockerignore` to match MATS - it appears to be important
to exclude MATScommon's `.npm/packages/npm-shrinkwrap.json` file from
the docker container build.
* Removes workarounds that were added to CI.
 
Closes #132
@ian-noaa ian-noaa closed this as completed Sep 5, 2023
@ian-noaa
Copy link
Collaborator Author

ian-noaa commented Sep 5, 2023

MATS was ignoring the MATScommon/meteor_packages/mats-common/.npm directory in its .dockerignore. Adding the .npm directory to METexpress's .dockerignore file fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: build process Build process issue component: CI/CD Continuous integration and deployment issues type: bug Fix something that is not working
Projects
None yet
1 participant