Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[script + doc] Update tar.bz2 mime type #140

Closed
wants to merge 1 commit into from

Conversation

ToMe25
Copy link
Contributor

@ToMe25 ToMe25 commented Sep 16, 2024

Changes

  • Unify the content type of gzip compressed tars to application/x-gtar-compressed
  • Unify the content type of bzip2 compressed tars to application/x-bzip-compressed-tar
  • Add .tgz to useful_types because it being missing when .tar.gz was there annoyed to perfectionist in me

Reasoning

Unfortunately the MIME types used for compressed tars vary quite a lot.
Below this I am trying to detail the pros and cons of the various MIME types I found being used, as well as my reasoning for choosing the ones I ultimately chose.

I looked at the default mime.types files for Debian, Fedora, OpenBSD, and Apache.
I also just googled around for a while and looked at lots of projects, of which I remember far too few to list them here, to get a rough feeling for which MIME types are used how much.

I also looked at xdg-shared-mime-info to see what they use.

Finding projects containing their own tar.bz2 mappings was pretty challenging, so I don't have much data to base my usage guesses on.
I still tried my best regardless, but the result may not be perfect.

Gzip

application/gzip

Pros

  • The only actually IANA registered type
  • Used by Fedora

Cons

  • Does not give any information about the type of the compressed file, which some people argue is more important than the compression used.
  • Not very widely used for tgz and tar.gz files

application/x-gtar-compressed (Used in this PR)

Pros

  • Currently used(would't be a change, and thus has no potential to break things)
  • Used by Debian
  • Seems to be the most widely used and recognized from what I found(I didn't make a detailed statistic I could use as a source)

Cons

  • Does not specify the compression algorithm used

application/x-tgz

Pros

  • Compact
  • Very Precise

Cons

  • Not very widely used

application/x-compressed-tar

Pros

  • Used by XDG

Cons

  • Does not specify the compression algorithm
  • From what I found its not very widely used

Bzip2

application/x-bzip2

Pros

  • None?
  • This is implicitly used by Fedora and Apache, but I don't think that matters all that much if it isn't even explicit.

Cons

  • Not very widely used
  • Doesn't describe the format of the compressed data

application/x-gtar-compressed

Pros

  • Currently used by lighttpd(Thus wouldn't be a change that could cause breakage)

Cons

  • Does not describe the compression format used
  • I could not find any other project using it

application/x-bzip-compressed-tar (Used in this PR)

Pros

  • Recognized as an alias by XDG
  • The most widely used type, as far as I can tell

Cons

  • Does not specify the bzip version

application/x-bzip2-compressed-tar

Pros

  • The most precise type
  • Used by XDG

Cons

  • Not as widely used as application/x-bzip-compressed-tar

Related question and notice

Wiki Page

The mimetype.assign wiki page config sample contains a few minor issues related to compressed tars.
These aren't important enough for me to want to learn to use redmine and create an account to edit the lighttpd wiki because of it.
But I still thought I should mention them somewhere, so here they are.

  • .gz is above .tar.gz, causing .tar.gz files to be sent as application/x-gzip
  • .bz2 is above .tar.bz2, causing .tar.bz2 files to be sent as application/x-bzip
  • .bz2 uses the type application/x-bzip, despite all of the mime.types files I checked, which contain bz2 at all, using application/x-bzip2. Those are Fedora, Apache, and XDG. Lighttpd also uses application/x-bzip2 by default.
  • .gz uses the MIME type application/x-gzip despite the officially registered application/gzip type now being registered for quite a while, and being more widely supported as far as I can tell
  • .tgz and .tar.gz use the type application/x-tgz, which seems to be relatively uncommon as far as I can tell.

As a side note, this example already uses application/x-bzip-compressed-tar for bzip compressed tars.

Future PR

The perfectionist in me wanted to check all the default types in configfile.c against the Debian and Fedora mime.types files and add those who are missing from one or both of those to useful_types.
I held back on that, because it would be out of scope for this PR, but I wanted to ask a few things about this:

  1. If I create a PR for that, would that have any chance of being accepted? It would likely only add single-digit mappings.
  2. If I do that, should I add those entries that one distro is missing, or only those both are missing?
  3. Is there any other distro I should check against as well?

@gstrauss
Copy link
Member

gstrauss commented Sep 17, 2024

Thanks for your attention to detail.

Some feedback:

You use the word "perfectionist" numerous times in your post, yet I am not sure you understand what it means.

If there is something that has a single correct answer and you can point me to the specification, then I will make those changes in lighttpd, if lighttpd is not already doing that single, right thing.

If that is not the case, then "perfection" is at best a misleading way to describe your post. Maybe you meant "completionist"?

Unfortunately the MIME types used for compressed tars vary quite a lot.

Yep. So why should I make any changes? (Serious question.) Do you have documentation to show that other choices provide better interoperability with existing (historical and current) clients? You have referenced XDG.

BTW, regarding tests/lighttpd.conf, I'll probably remove the use of mimetype.assign from tests/lighttpd.conf


The perfectionist in me wanted to check all the default types in configfile.c against the Debian and Fedora mime.types files and add those who are missing from one or both of those to useful_types.

The builtin-list of mime-types in configfile.c are not intended to be exhaustive and complete. They are intended to be common media types for web usage, as documented in the comment above the config_mimetypes_default()
https://github.com/lighttpd/lighttpd1.4/blob/master/src/configfile.c#L947

/* common media types for the web
 *              
 * references:      
 *              
 * lighttpd doc/scripts/create-mime.pl
 * https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types
 * https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Image_types
 * https://docs.w3cub.com/http/basics_of_http/mime_types
 * http://www.iana.org/assignments/media-types/media-types.xhtml
 * https://salsa.debian.org/debian/media-types/-/blob/master/mime.types
 * https://src.fedoraproject.org/rpms/mailcap/tree/rawhide
 *   https://pagure.io/mailcap/blob/master/f/mime.types
 *
 */

If you are suggesting changes, then please document how your suggestions differ from the reference above, and provide references why your suggestions are better. Specifically, please highlight and provide references documenting where your choices provide better interoperability with existing (historical and current) clients. (I did not notice any primary sources, though you do provide references to XDG.)

I am not an authority on mime-types and will not engage in any sort of debate on the matter. I do not claim that my choices are the best, though I have tried to make a solid attempt and have provided references which informed me.


bzip2 compression for web pages from web servers never was a popular choice, and has since been supplanted by brotli and zstd as alternatives to gzip. Why should I make any changes for bzip2? bzip2 support in lighttpd mod_deflate is disabled by default and must be selected during build time if still desired.


mimetype.assign wiki page config sample is old, created back in 2007.

.gz is above .tar.gz, causing .tar.gz files to be sent as application/x-gzip
.bz2 is above .tar.bz2, causing .tar.bz2 files to be sent as application/x-bzip

These statements are false. Did you test this? Those statements may have been true prior to lighttpd 1.4.46 (released in 2017) with commit:0cc7556ae The sample has more than 16 entries, and so longest extension match is what is used since lighttpd 1.4.46.

.bz2 uses the type application/x-bzip, despite all of the mime.types files I checked, which contain bz2 at all, using application/x-bzip2. Those are Fedora, Apache, and XDG. Lighttpd also uses application/x-bzip2 by default.

Maybe I'll change that. Or maybe I'll remove bz2 from the sample config on that wiki page.

.gz uses the MIME type application/x-gzip despite the officially registered application/gzip type now being registered for quite a while, and being more widely supported as far as I can tell
.tgz and .tar.gz use the type application/x-tgz, which seems to be relatively uncommon as far as I can tell.

As noted above, that part of the wiki page was written back in 2007, when the type was application/x-gzip. I may simply remove most of the sample config in favor of a simpler example.

@gstrauss
Copy link
Member

I updated mimetype.assign wiki page config sample to remove the longer sample mimetype.assign and I updated the wiki page to note that since lighttpd 1.4.46, the in-order matching applies only if mimetype.assign contains 16 or fewer entries.

@gstrauss
Copy link
Member

Future PR

The perfectionist in me wanted to check all the default types in configfile.c against the Debian and Fedora mime.types files and add those who are missing from one or both of those to useful_types. I held back on that, because it would be out of scope for this PR, but I wanted to ask a few things about this:

1. If I create a PR for that, would that have any chance of being accepted? It would likely only add single-digit mappings.

2. If I do that, should I add those entries that one distro is missing, or only those both are missing?

3. Is there any other distro I should check against as well?

If I create a PR for that, would that have any chance of being accepted?

Short answer: no. As I wrote above, the builtin-list of mime-types in configfile.c are not intended to be exhaustive and complete. There needs to be a more convincing argument than "completionist" to add entries to the default mimetype.assign in src/configfile.c

@gstrauss
Copy link
Member

Fedora /etc/mime.types uses application/gzip for gz and tgz, so that is why it is so in lighttpd doc/config/conf.d/mime.conf. Since the built-in default for mimetype.assign in src/configfile.c (which I wrote) defines .tgz, I'll accept that addition to create-mime.conf.pl.

+".tgz"     => "application/x-gtar-compressed",

I will accept this change to create-mime.conf.pl since the original lines appear to be a typo and date back to 2014 according to the git history, and you have referenced that XDG recognizes application/x-bzip-compressed-tar.

-	".tbz"     => "application/x-gtar-compressed",
-	".tar.bz2" => "application/x-gtar-compressed",
+	".tbz"     => "application/x-bzip-compressed-tar",
+	".tar.bz2" => "application/x-bzip-compressed-tar",

Please update this PR to remove changes to tests/lighttpd.conf

gstrauss added a commit to gstrauss/lighttpd1.4 that referenced this pull request Sep 17, 2024
@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 17, 2024

You use the word "perfectionist" numerous times in your post, yet I am not sure you understand what it means.

I am aware that "perfectionist" is not the 100% correct word, but since I wasn't able to remember the word I actually meant, still don't, and couldn't easily find it by googling, this was the closest I could get ;)

If you are suggesting changes, then please document how your suggestions differ from the reference above

I believe you misunderstood me here.
I did not mean to suggest any changes to the c code.
What I meant is roughly "Since those types referenced in the c code are already acknowledged as common by lighttpd I want to add those of them that Debian and/or Fedora don't recognize to the useful_types mappings in the pearl script".

Edit: The reasoning behind this, what I called me being a perfectionist, is "If the default config supports this, shouldn't the default generation mime config do so as well?".

@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 17, 2024

These statements are false. Did you test this? Those statements may have been true prior to lighttpd 1.4.46 (released in 2017) with commit:0cc7556ae The sample has more than 16 entries, and so longest extension match is what is used since lighttpd 1.4.46.

My apologies, I was not aware of there being a different matching logic for longer configs, and only tested it with a subset of the example config.

Short answer: no. As I wrote above, the builtin-list of mime-types in configfile.c are not intended to be exhaustive and complete. There needs to be a more convincing argument than "completionist" to add entries to the default mimetype.assign in src/configfile.c

As I already mentioned, I did not mean to suggest adding values to the c code, sorry about that.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 17, 2024

Please update this PR to remove changes to tests/lighttpd.conf

Of course.
I'll do that later today when I'm on my PC again.

@ToMe25 ToMe25 force-pushed the update-tar-bz2-mime-type branch from b51d235 to 39d9ad9 Compare September 17, 2024 13:41
@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 17, 2024

I rebased this PR on top of the current changes to master.

If you'd like me to either regenerate the file doc/config/conf.d/mime.conf, or get rid of my changes to it please let me know :)

@gstrauss
Copy link
Member

If you'd like me to either regenerate the file doc/config/conf.d/mime.conf, or get rid of my changes to it please let me know :)

Please remove those changes from the PR. The first line of that file indicates that it is generated from create-mime.conf.pl. Periodically, I manually regenerate it on a Fedora system. On the other hand, you manually edited the file in your patch. Please remove those changes from this PR.


What I meant is roughly "Since those types referenced in the c code are already acknowledged as common by lighttpd I want to add those of them that Debian and/or Fedora don't recognize to the useful_types mappings in the pearl script".

Edit: The reasoning behind this, what I called me being a perfectionist, is "If the default config supports this, shouldn't the default generation mime config do so as well?".

Other than for historical reasons, the entries in %useful contain some entries that might be useful to web servers and might not have extensions (and so would not be listed in /etc/mime.types). No, this list in create-mime.conf.pl -- a Perl script, not "pearl" as you had written -- should not be expanded without proper reasoning, and "completionist" is an insufficient reason.


You use the word "perfectionist" numerous times in your post, yet I am not sure you understand what it means.

I am aware that "perfectionist" is not the 100% correct word, but since I wasn't able to remember the word I actually meant, still don't, and couldn't easily find it by googling, this was the closest I could get ;)

Please take the feedback that your prose is misleading and distracting from your message. It also is severely discrediting to your abilities. None of what you wrote or your behavior describes a perfectionist, so if you do understand what the word means, you should not have used the word even once to describe yourself or your actions in this PR.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 17, 2024

a Perl script, not "pearl" as you had written

I'm aware, I wrote that comment on my phone and didn't notice what auto-correct did there.

None of what you wrote or your behavior describes a perfectionist

I googled this again, and still do not agree with this statement.
However I have no desire to argue over the exact meaning of a word which wasn't even central to the message I meant to convey.
Nevermind, after reading everything again to check for errors, I am now doing so below.

Please remove those changes from the PR. The first line of that file indicates that it is generated from create-mime.conf.pl. Periodically, I manually regenerate it on a Fedora system. On the other hand, you manually edited the file in your patch. Please remove those changes from this PR.

I read that header, which is why I asked whether I should regenerate the file, or get rid of my changes.
I will edit my commit again to get rid of my changes as well.


and "completionist" is an insufficient reason.

Looking at multiple dictionary definitions of "completionist" I do not agree at all that this is what applies to this situation.
I may be a completionist in some cases as well, but that does not apply here.

After all my intention was never "to complete the collection to contain everything related", but "to ensure parity with another system as a minimal baseline".

At this point I also wish to note that me being the way I am, to avoid using any specific wording, was only my motivation.
At no point did I intend to make that the reason why this change should be accepted, only why I would possibly spend the effort on a seemingly meaningless task.

My reasoning for why this change might be worth including was instead, that I thought a script that generates a replacement for the built-in defaults generating a config that reduces support in some aspect, on a major distro, might be considered less than ideal.

Anyway, I didn't mean to write that much about the exact meaning of words.
But I just got kind of annoyed when someone criticizing me over the meaning of a word, then uses a word to describe me which I not only consider incorrect, but which also seems to not be applicable to the current situation at all, if I understand the dictionary definition correctly.

Oh, one last note: By perfectionist I do not mean "One who believes being perfect as a person is achievable", but "One who gets annoyed at even minor imperfections".
However from looking at various dictionaries the second definition does not seem to be any less correct or established than the first.
And as I said, that is not the exact word I meant to use, but googling for a few minutes did not result in me finding the word I meant to use, so I believed this to be a close enough fit.

 * Unify the content type of gzip compressed tars to application/x-gtar-compressed
 * Unify the content type of bzip2 compressed tars to application/x-bzip-compressed-tar

Exact reasoning will be provided in the PR.
@ToMe25 ToMe25 force-pushed the update-tar-bz2-mime-type branch from 39d9ad9 to 0b0b24c Compare September 17, 2024 18:07
@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 17, 2024

I updated the commit in this PR to no longer contain the changes to doc/config/conf.d/mime.conf.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 17, 2024

If that is not the case, then "perfection" is at best a misleading way to describe your post. Maybe you meant "completionist"?

Despite having read your initial comment multiple times, I seem to have failed to respond to this part.
I wish to clarify some things I see here, which weren't what I meant to convey.
By saying something annoys the perfectionist in me, I did not mean to imply that I, anything I do, or this PR are perfect.
In fact I never had the intention to describe this PR as "perfection".

What I meant to convey instead is that the perceived inconsistencies(.tar.gz being mentioned, but .tgz not, the default config supporting file types that the generated config doesn't on Debian or Fedora, and stuff like that) are things I perceive as "imperfections", and which annoy me a lot more than they do other people.

This is how people in most communities I spent significant amounts of time in seemed to have used wording like "I'm a perfectionist about these things", and so I, possibly incorrectly, assumed that this was what would be conveyed when I used similar wording.

@gstrauss
Copy link
Member

I wrote:

Please take the feedback that your prose is misleading and distracting from your message. It also is severely discrediting to your abilities. None of what you wrote or your behavior describes a perfectionist, so if you do understand what the word means, you should not have used the word even once to describe yourself or your actions in this PR.

Unfortunately, you missed the message and come across to me as extremely immature trying to use lots of words to justify your previous words. Sometimes you should just take the note.

I wrote that I will accept the suggested to changes to create-mime.conf.pl and provided two reasons why:

  1. .tgz is part of src/configfile.c mimetypes_default which I wrote
  2. The mime type listed in create-mime.conf.pl for .tbz and .tar.bz2 appears to me to be a typo from 2014.

@gstrauss
Copy link
Member

As you noted in your original post, Fedora /etc/mime.types provides .tgz mapping to application/gzip

As I regenerate doc/config/conf.d/mime.conf using create-mime.conf.pl on a Fedora system, the value from Fedora /etc/mime.types is what used for .tgz in doc/config/conf.d/mime.conf, not the .tgz mapping added to %useful in this PR.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Sep 18, 2024

Thanks for accepting this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants