Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bazel] Support generating a secondary cache #1405

Merged
merged 18 commits into from
Jul 1, 2024

Conversation

allsey87
Copy link
Contributor

@allsey87 allsey87 commented Jun 14, 2024

This is a working solution for generating a separate Emscripten cache. Note that this requires an additional entry in the workspace as follows:

load("@emsdk//:emscripten_cache.bzl", emsdk_emscripten_cache = "emscripten_cache")
emsdk_emscripten_cache()

When used like this, the default Emscripten cache will be used. However, if the entry is as follows:

load("@emsdk//:emscripten_cache.bzl", emsdk_emscripten_cache = "emscripten_cache")
emsdk_emscripten_cache(flags = ["--lto"])

Then embuilder will be called to build all system libraries and ports (i.e., the ALL option to embuilder) with the LTO option enabled. This can take awhile, so I have also made possible to specify which libraries you want to build explicitly:

load("@emsdk//:emscripten_cache.bzl", emsdk_emscripten_cache = "emscripten_cache")
emsdk_emscripten_cache(
    flags = ["--lto"],
    libraries = [
        "crtbegin",
        "libprintf_long_double-debug",
        "libstubs-debug",
        "libnoexit",
        "libc-debug",
        "libdlmalloc",
        "libcompiler_rt",
        "libc++-noexcept",
        "libc++abi-debug-noexcept",
        "libsockets"
    ]
)

Resolves #807, resolves #971, resolves #1099, resolves #1362, resolves #1401

@allsey87
Copy link
Contributor Author

Note that although all checks are currently passing, this is only testing the pass-through behaviour for the moment. I will add additional checks that test the secondary cache shortly.

@allsey87 allsey87 marked this pull request as ready for review June 16, 2024 12:49
@allsey87
Copy link
Contributor Author

allsey87 commented Jun 16, 2024

@sbc100, @walkingeyerobot, this is now ready for review. I came close to being able to support secondary caches on Windows, but there are some quirks to when Node.js is installed under external that do not apply to Linux and Mac. I would have liked to have gotten this working, but I have already spent a whole day just on supporting Windows and I really cannot spend any more of my time on this.

The steps for enabing Windows support in its current state are:

  1. Uncomment line 37 and remove line 36 of emscripten_cache.bzl
  2. Uncomment the test at the bottom of test_bazel.ps1

In this state, the test on Windows will fail on line 78 of emscripten_cache.bzl since Node.js is not available at external/nodejs_windows_amd64/bin/nodejs/node.exe.

@walkingeyerobot
Copy link
Collaborator

Does this degrade Windows support in any way? Or does building non-prebuilt system libraries simply not work on Windows?

bazel/emscripten_cache.bzl Outdated Show resolved Hide resolved
bazel/emscripten_cache.bzl Outdated Show resolved Hide resolved
@sbc100
Copy link
Collaborator

sbc100 commented Jun 20, 2024

Does this degrade Windows support in any way? Or does building non-prebuilt system libraries simply not work on Windows?

IIUC this doesn't degrade existing features but adds a new feature that simply doesn't work on windows (yet). Seems like its probably worth addressing that before we land.

@allsey87
Copy link
Contributor Author

Does this degrade Windows support in any way? Or does building non-prebuilt system libraries simply not work on Windows?

IIUC this doesn't degrade existing features but adds a new feature that simply doesn't work on windows (yet).

This is correct. Windows still works as normal with the prebuilt cache as long as emscripten_cache does not contain any arguments.

Seems like its probably worth addressing that before we land.

The only path forward that I am aware of which might solve the problem of Node.js not being available on Windows in time would be to migrate to rules_js as proposed in #1388 since the newer repo uses the bzlmod approach (MODULE.bazel) instead of the WORKSPACE file to set up Node.js. I suspect that the bzlmod initialisation may happen earlier than the WORKSPACE which could mean that Node.js would be available on all platforms by time I need to run embuilder.

Would it be ok to add this on to this PR or should it be in a separate PR?

@walkingeyerobot
Copy link
Collaborator

Let's do that in a separate PR; this one is already kind of beefy.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 20, 2024

To be honest I'm surprised you need node_js configured at all to run embuilder... can't you just using a config file that doesn't contain node_js at all when you call embuilder?

Then create a new config file which is a copy of the core on, just with a new CACHE at the end, for use in the rest of the toolchain? Also, it might be possible to just stick with the existing config file and set the EM_CACHE environment variable (for use in the rest of the toolchain that is.. i.e. after the toolchain has been setup).

@allsey87
Copy link
Contributor Author

allsey87 commented Jun 20, 2024

To be honest I'm surprised you need node_js configured at all to run embuilder... can't you just using a config file that doesn't contain node_js at all when you call embuilder?

Then create a new config file which is a copy of the core on, just with a new CACHE at the end, for use in the rest of the toolchain? Also, it might be possible to just stick with the existing config file and set the EM_CACHE environment variable (for use in the rest of the toolchain that is.. i.e. after the toolchain has been setup).

I just had a quick look into this. It might actually just be the call to check_sanity in embuilder.py that results in the node version check and requirement on the NODE_JS variable. Looking at those functions, it might be possible to bypass that check by setting EM_IGNORE_SANITY in the environment and hence removing the dependency on Node.js...

@sbc100
Copy link
Collaborator

sbc100 commented Jun 20, 2024

To be honest I'm surprised you need node_js configured at all to run embuilder... can't you just using a config file that doesn't contain node_js at all when you call embuilder?
Then create a new config file which is a copy of the core on, just with a new CACHE at the end, for use in the rest of the toolchain? Also, it might be possible to just stick with the existing config file and set the EM_CACHE environment variable (for use in the rest of the toolchain that is.. i.e. after the toolchain has been setup).

I just had a quick look into this. It might actually just be the call to check_sanity in embuilder.py that results in the node version check and requirement on the NODE_JS variable. Looking at those functions, it might be possible to bypass that check by setting EM_IGNORE_SANITY in the environment and hence removing the dependency on Node.js...

We shouldn't be checking for NODE_JS untill we actually need it. I can take a look at fixing that on the emscripten side if that is true.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 20, 2024

Looks like you can probably just ignore the node version-check warnings if embuilder generates any.

@allsey87
Copy link
Contributor Author

To be honest I'm surprised you need node_js configured at all to run embuilder... can't you just using a config file that doesn't contain node_js at all when you call embuilder?
Then create a new config file which is a copy of the core on, just with a new CACHE at the end, for use in the rest of the toolchain? Also, it might be possible to just stick with the existing config file and set the EM_CACHE environment variable (for use in the rest of the toolchain that is.. i.e. after the toolchain has been setup).

I just had a quick look into this. It might actually just be the call to check_sanity in embuilder.py that results in the node version check and requirement on the NODE_JS variable. Looking at those functions, it might be possible to bypass that check by setting EM_IGNORE_SANITY in the environment and hence removing the dependency on Node.js...

We shouldn't be checking for NODE_JS untill we actually need it. I can take a look at fixing that on the emscripten side if that is true.

It is true, the call stack is main -> check_sanity -> perform_sanity_checks -> check_node_version.

Then create a new config file which is a copy of the core on, just with a new CACHE at the end, for use in the rest of the toolchain?

This is already how things are working. The CACHE line is added to the end of default_config which is written out to emscripten_config and used by the toolchain.

Also, it might be possible to just stick with the existing config file and set the EM_CACHE environment variable (for use in the rest of the toolchain that is.. i.e. after the toolchain has been setup).

I was experimenting with this approach initially, I will investigate and try to remember why I don't use the environment variable anymore.

@allsey87
Copy link
Contributor Author

Looks like you can probably just ignore the node version-check warnings if embuilder generates any.

Wrong version is a warning. Node.js missing results in a crash.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 20, 2024

Looks like you can probably just ignore the node version-check warnings if embuilder generates any.

Wrong version is a warning. Node.js missing results in a crash.

If necessary we could probably fix upstream emscripten so that we don't depend on node JS when running embuilder. But fixing it downstream (here) seems reasonable too.. especially if we want to support older versions of emscripten.

@allsey87
Copy link
Contributor Author

Looks like you can probably just ignore the node version-check warnings if embuilder generates any.

Wrong version is a warning. Node.js missing results in a crash.

If necessary we could probably fix upstream emscripten so that we don't depend on node JS when running embuilder. But fixing it downstream (here) seems reasonable too.. especially if we want to support older versions of emscripten.

So it turns out there are two places where Node.js is being checked:

  1. check_node_version
  2. set_config_from_tool_location

But it is possible to work around both these checks with a small hack by setting the following environment variables:

EM_IGNORE_SANITY=1
EM_NODE_JS=empty

The empty bit is the small hack, it can be any non-empty string and is used to get us past the checks inside set_config_from_tool_location by setting val to something that isn't "falsy".

@allsey87
Copy link
Contributor Author

And we have Windows support!! I am not sure why the test-bazel-mac-arm64 is stuck at "Waiting for status to be reported" and not starting, but I think that is unrelated to this PR. Perhaps one of you two can start/trigger this check manually?

@allsey87 allsey87 requested a review from sbc100 June 21, 2024 08:55
bazel/README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks way better now. Thanks.

I leave it to @walkingeyerobot to have the final say on whether this makes sense to land.

bazel/test_secondary_lto_cache/BUILD Outdated Show resolved Hide resolved
@allsey87
Copy link
Contributor Author

I noticed today that my headers were still being included from the default cache location instead of the generated cache location. I think that is due to toolchain.bzl#483. Am I correct in thinking this path should be updated to point to the newly generated cache? I guess it doesn't really matter since we are just looking at header files?

@sbc100
Copy link
Collaborator

sbc100 commented Jun 26, 2024

The header files should be exactly the same so it shouldn't matter which sysroot is used for headers.

@walkingeyerobot
Copy link
Collaborator

thanks very much! this looks good to me.

@walkingeyerobot walkingeyerobot enabled auto-merge (squash) June 27, 2024 21:19
@allsey87
Copy link
Contributor Author

allsey87 commented Jul 1, 2024

@walkingeyerobot @sbc100 I think this PR will need to be manually merged due to a CircleCI gitch (test-bazel-mac-arm64 is just waiting indefinitely).

@sbc100 sbc100 disabled auto-merge July 1, 2024 16:22
@sbc100 sbc100 merged commit 82d41a7 into emscripten-core:main Jul 1, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants