Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep magic comments in the minify output for shebang and encoding #8

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

thinkdoggie
Copy link

Shebang and encoding declare are two types of magic comments in the beginning lines of Python source code. In general, the minifier had better keep these lines as the origin source.

For example,

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
...

@dflook dflook added the enhancement New feature or request label Mar 16, 2020
@dflook
Copy link
Owner

dflook commented Mar 16, 2020

The output from python-minifier is always UTF-8, so the coding comment is not needed.
An option to preserve the shebang would be sensible.

@thinkdoggie
Copy link
Author

Hi @dflook, thanks for your reviewing. Even the encoding for the output has been UTF-8, Python 2 interpreter still need this magic comment to load code correctly if non-latin characters used in the content. Because,

Python 2 uses ASCII as its default encoding for its strings which cannot store Chinese characters. On the other hand, Python 3 uses Unicode encoding for its strings by default which can store Chinese characters.

Here're some reference links about this topic:

@thinkdoggie
Copy link
Author

thinkdoggie commented Mar 17, 2020

Hi @dflook, do you think shebang or encoding hint should be preserved by default?

If not, the minified program will not be fully equivalent to the original source. For example,

  • Without shebang, the script could not be executable from shell;
  • Without encoding comment, the source containing non-lantin characters cannot be loaded by Python 2.

On the other hand, I also suggest to provide the following options in the command-line to suppress this behavior: --remove-shebang and --remove-encoding-comment. Thanks!

@dflook
Copy link
Owner

dflook commented Mar 18, 2020

Hi, thanks for continuing to work on this!

The encoding comment is an instruction for the parser. Since we work on the AST produced by the parser, the coding comment in the source is no longer relevant. In Python 3 we can safely output UTF-8 and rely on the parser to produce the same string.

The sequence of bytes might be different between input and output if the encoding is different, but the parser should produce the same string. In this case copying the coding comment from the input to output is wrong, as it will probably not match the output encoding. If using the pyminify command the output is UTF-8. When using the minify() function you get a python str that you can encode how you like, but you will need to add a coding comment yourself to ensure it parses correctly.

I'm not too sure of how this works in Python 2 to be honest, but I'm reluctant to make any changes without a test case. Do you have an example where the current behaviour is wrong?

@dflook
Copy link
Owner

dflook commented Mar 18, 2020

For the shebang I can't quite decide if it should be removed or preserved by default, but --remove-shebang is along the right lines.

@thinkdoggie
Copy link
Author

thinkdoggie commented Mar 23, 2020

Hi @dflook , I totally agree that test cases are always necessary for any code changes. Nevertheless, it looks no existing test scaffold for the minify() function. If I understand correctly, I am willing to add more test code for this new enhancement.

The magic comment for encoding is just a hit for Python 2 interpreter rather than the file encoding itself. There is a very old spec PEP263 to address this topic. I will show more examples and negative cases to explain how it works and why it is necessary for Python 2.

@blikjeham
Copy link

How is it going with this Pull Request? I ran into the issue that my shebang gets removed from the output file (#34). This pull request has been stale for a year and a half. What is the status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants