Keep magic comments in the minify output for shebang and encoding #8

thinkdoggie · 2020-03-16T14:08:39Z

Shebang and encoding declare are two types of magic comments in the beginning lines of Python source code. In general, the minifier had better keep these lines as the origin source.

For example,

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
...

…lare

dflook · 2020-03-16T14:24:37Z

The output from python-minifier is always UTF-8, so the coding comment is not needed.
An option to preserve the shebang would be sensible.

thinkdoggie · 2020-03-16T14:40:54Z

Hi @dflook, thanks for your reviewing. Even the encoding for the output has been UTF-8, Python 2 interpreter still need this magic comment to load code correctly if non-latin characters used in the content. Because,

Python 2 uses ASCII as its default encoding for its strings which cannot store Chinese characters. On the other hand, Python 3 uses Unicode encoding for its strings by default which can store Chinese characters.

Here're some reference links about this topic:

thinkdoggie · 2020-03-17T02:35:28Z

Hi @dflook, do you think shebang or encoding hint should be preserved by default?

If not, the minified program will not be fully equivalent to the original source. For example,

Without shebang, the script could not be executable from shell;
Without encoding comment, the source containing non-lantin characters cannot be loaded by Python 2.

On the other hand, I also suggest to provide the following options in the command-line to suppress this behavior: --remove-shebang and --remove-encoding-comment. Thanks!

dflook · 2020-03-18T11:32:14Z

Hi, thanks for continuing to work on this!

The encoding comment is an instruction for the parser. Since we work on the AST produced by the parser, the coding comment in the source is no longer relevant. In Python 3 we can safely output UTF-8 and rely on the parser to produce the same string.

The sequence of bytes might be different between input and output if the encoding is different, but the parser should produce the same string. In this case copying the coding comment from the input to output is wrong, as it will probably not match the output encoding. If using the pyminify command the output is UTF-8. When using the minify() function you get a python str that you can encode how you like, but you will need to add a coding comment yourself to ensure it parses correctly.

I'm not too sure of how this works in Python 2 to be honest, but I'm reluctant to make any changes without a test case. Do you have an example where the current behaviour is wrong?

dflook · 2020-03-18T11:34:37Z

For the shebang I can't quite decide if it should be removed or preserved by default, but --remove-shebang is along the right lines.

thinkdoggie · 2020-03-23T16:30:39Z

Hi @dflook , I totally agree that test cases are always necessary for any code changes. Nevertheless, it looks no existing test scaffold for the minify() function. If I understand correctly, I am willing to add more test code for this new enhancement.

The magic comment for encoding is just a hit for Python 2 interpreter rather than the file encoding itself. There is a very old spec PEP263 to address this topic. I will show more examples and negative cases to explain how it works and why it is necessary for Python 2.

blikjeham · 2021-09-20T13:26:49Z

How is it going with this Pull Request? I ran into the issue that my shebang gets removed from the output file (#34). This pull request has been stale for a year and a half. What is the status?

Keep magic comments in the minify output for shebang and encoding dec…

53b7e3b

…lare

dflook added the enhancement New feature or request label Mar 16, 2020

Refactor code for extracting shebang and encoding comments

a91171e

blikjeham mentioned this pull request Sep 20, 2021

The shebang is stripped from the minified python code #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep magic comments in the minify output for shebang and encoding #8

Keep magic comments in the minify output for shebang and encoding #8

thinkdoggie commented Mar 16, 2020

dflook commented Mar 16, 2020

thinkdoggie commented Mar 16, 2020

thinkdoggie commented Mar 17, 2020 •

edited

Loading

dflook commented Mar 18, 2020

dflook commented Mar 18, 2020

thinkdoggie commented Mar 23, 2020 •

edited

Loading

blikjeham commented Sep 20, 2021

Keep magic comments in the minify output for shebang and encoding #8

Are you sure you want to change the base?

Keep magic comments in the minify output for shebang and encoding #8

Conversation

thinkdoggie commented Mar 16, 2020

dflook commented Mar 16, 2020

thinkdoggie commented Mar 16, 2020

thinkdoggie commented Mar 17, 2020 • edited Loading

dflook commented Mar 18, 2020

dflook commented Mar 18, 2020

thinkdoggie commented Mar 23, 2020 • edited Loading

blikjeham commented Sep 20, 2021

thinkdoggie commented Mar 17, 2020 •

edited

Loading

thinkdoggie commented Mar 23, 2020 •

edited

Loading