Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Hermeticity issue with libarchive depending on LANG environment variable #2256

Open
RobertClarke64 opened this issue Jun 13, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@RobertClarke64
Copy link

RobertClarke64 commented Jun 13, 2024

What happened?

There is a problem where the tar binary from libarchive is behaving non-hermetically as it depends on the system's LANG environment variable. Maybe we could patch it to hard-code something like en_US.UTF-8?

This problem appeared as tar was failing for me when coming across UTF-8 characters in filenames, until I set LANG to en_US.UTF-8.

Full context of issue on bazel slack.

Version

Development (host) and target OS/architectures: macOS Sonoma 14.5

Output of bazel --version: bazel 7.2.0

How to reproduce

Use the tar toolchain to extract an archive containing files which have utf8 characters in their names, making sure that LANG is not set. This should fail to extract (for me just outputting tar: (null) as an error message).

Set LANG to a UTF-8 locale and it should work.

@RobertClarke64 RobertClarke64 added the bug Something isn't working label Jun 13, 2024
@zaucy
Copy link
Contributor

zaucy commented Jun 13, 2024

It seems to me that this is a runtime issue, not a build-time issue.

Does this behaviour differ if compiled without bazel? It seems like reading the LANG environment variable at runtime is intended.

Maybe related? libarchive/libarchive#587 probably not

A configuration option would probably be good to change this behaviour though!

@RobertClarke64
Copy link
Author

RobertClarke64 commented Jun 14, 2024

I think reading LANG at runtime is intended. I think we just don't want it to behave that way in Bazel, as we want it to always produce the same results.

@zaucy
Copy link
Contributor

zaucy commented Jun 14, 2024

Yea that's completely understandable especially if you're using it as a tool in a bazel rule.

I do wonder would setting the LANG environment variable when doing ctx.run not be enough or maybe even be preferred than expecting a tool in the BCR to be suitable for a bazel rule by default?

@alexeagle
Copy link
Contributor

Ah, I just ran across this as well downstream in bazel-lib where we have the tar rule, filed bazel-contrib/bazel-lib#1018

@alexeagle
Copy link
Contributor

@fmeum said on the linked slack thread

bsdtar seems to contain locale-sensitive logic and it's possible that this be removed during its build (say as a configure option). Since the prebuilt tar is meant to be as hermetic as possible, this would be a useful improvement

@zaucy do you happen to know whether that's possible? I'm looking at HAVE_LOCALE_H but I don't think the logic guarded by that is what reads the locale from the environment.

@zaucy
Copy link
Contributor

zaucy commented Dec 18, 2024

maaaaybe setting HAVE_SETLOCALE to 0?

https://github.com/libarchive/libarchive/blob/819a50a0436531276e388fc97eb0b1b61d2134a3/tar/bsdtar.c#L190-L193

But that would just remove the setlocale call. I'm not really sure what the right thing to do here is.

@fmeum
Copy link
Contributor

fmeum commented Dec 18, 2024

That should work: The default locale of a C program is C, but that call to setlocale sets it to whatever the typical environment variables say.

@fmeum
Copy link
Contributor

fmeum commented Dec 18, 2024

Ah, nvm, forcing the C locale could be a problem if it ends up disallowing UTF-8 characters in filenames. In that case forcing LC_ALL=C.UTF-8 in Linux or LC_ALL=en_US.UTF-8 is probably the best you could do.

alexeagle added a commit to bazel-contrib/bazel-lib that referenced this issue Dec 19, 2024
`bsdtar` will fail to extract archives with unicode characters in a filename, on systems where the default locale is `C` or some non-UTF value.

Users encounter this as strange extract failures, but only on a subset of the systems they work on; this is non-hermetic behavior.

Workaround bazelbuild/bazel-central-registry#2256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants