Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an unhandled error "There are some data after the end of the payload data" #538

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Mrw33554432
Copy link

#536
Now when the problem happens, the code will raise the warning and break the dead loop. The other option which is force 1 place forward is also provided inside the code. Choose which you want (I am not familiar with decompression logic, but break works fine in my case. It may lead to potential data loss, or maybe not, idk). The warning is handled via warnings.warn, because I want to remain most part of the code unchanged. Other wise you can try to use the report method perhaps, but the parameter self.q seems not available in this method now.

@miurahr
Copy link
Owner

miurahr commented Nov 2, 2023

@Mrw33554432 The patch breaks a feature in some case. Please check a result of tests by tox command.

@Mrw33554432
Copy link
Author

@Mrw33554432 The patch breaks a feature in some case. Please check a result of tests by tox command.

I'm not quite sure why that happens, because the only thing this code do is break a dead loop (when out_remaining>0, len(tmp)<=0), and it should not be trigger by anything else... I might need to know why there wasn't an else before being able to fix it (I have no idea about that).

@miurahr
Copy link
Owner

miurahr commented Nov 2, 2023

tests/test_archive.py::test_compress_bz2_bcj FAILED [ 8%]
tests/test_archive.py::test_compress_zstd_2 FAILED [ 9%]

@miurahr
Copy link
Owner

miurahr commented Nov 2, 2023

@Mrw33554432 The patch breaks a feature in some case. Please check a result of tests by tox command.

I'm not quite sure why that happens, because the only thing this code do is break a dead loop (when out_remaining>0, len(tmp)<=0), and it should not be trigger by anything else... I might need to know why there wasn't an else before being able to fix it (I have no idea about that).

In some algorithm, there is a case that first blocks of input data are consumed but there is no output because some of second blocks of input data are required to complete a calculation. It is a case that out_remaining is not zero but there i s no output. You may need to check there is no next chunk of data.

@Mrw33554432
Copy link
Author

Mrw33554432 commented Nov 2, 2023

tests/test_archive.py::test_compress_bz2_bcj FAILED [ 8%]
tests/test_archive.py::test_compress_zstd_2 FAILED [ 9%]

I will just try the out_remaining-=1 version, or if this one fail then I will add a mechanism that will only break before next run (in case some data is updated in +1 iter delay.
Update:
both version added, need to test how they work.

@miurahr
Copy link
Owner

miurahr commented Nov 2, 2023

You can run test with tox command on your desktop before push.

@miurahr
Copy link
Owner

miurahr commented Nov 2, 2023

Could you add a test case that reproduce the problem with minimum data which failed without the fix?
You may want to use pytest-timeout
A tests/test_badfiles.py would be best place to put the case.

see https://github.com/miurahr/py7zr/blob/master/docs/contribution.rst

@Mrw33554432
Copy link
Author

tests/test_extra_codecs.py::test_compress_decompress_brotli FAILED                                                                                                                                                           [ 50%]
tests/test_extra_codecs.py::test_decompress_brotli PASSED                                                                                                                                                                    [ 50%]
tests/test_extra_codecs.py::test_compress_deflate64 FAILED                                                                                                                                                                   [ 50%]
tests/test_extra_codecs.py::test_decompress_deflate64 PASSED                                                                                                                                                                 [ 50%]
tests/test_benchmark.py::test_benchmark_filters_compress[zstd-filters0] PASSED                                                                                                                                               [ 23%]
tests/test_benchmark.py::test_benchmark_filters_compress[bzip2-filters1] PASSED                                                                                                                                              [ 24%]
tests/test_benchmark.py::test_benchmark_filters_compress[lzma+bcj-filters2] PASSED                                                                                                                                           [ 24%]
tests/test_benchmark.py::test_benchmark_filters_compress[lzma2+bcj-filters3] PASSED                                                                                                                                          [ 24%]
tests/test_benchmark.py::test_benchmark_filters_compress[bzip2+aes-filters4] PASSED                                                                                                                                          [ 25%]
tests/test_benchmark.py::test_benchmark_filters_compress[lzma2+aes-filters5] PASSED                                                                                                                                          [ 25%]
tests/test_benchmark.py::test_benchmark_filters_decompress[zstd-filters0] PASSED                                                                                                                                             [ 25%]
tests/test_benchmark.py::test_benchmark_filters_decompress[bzip2-filters1] FAILED                                                                                                                                            [ 25%]
tests/test_benchmark.py::test_benchmark_filters_decompress[lzma+bcj-filters2] PASSED                                                                                                                                         [ 26%]
tests/test_benchmark.py::test_benchmark_filters_decompress[lzma2+bcj-filters3] PASSED                                                                                                                                        [ 26%]
tests/test_benchmark.py::test_benchmark_filters_decompress[bzip2+aes-filters4] FAILED                                                                                                                                        [ 26%]
tests/test_benchmark.py::test_benchmark_filters_decompress[lzma2+aes-filters5] PASSED                                                                                                                                        [ 27%]
tests/test_benchmark.py::test_benchmark_text_compress[ppmd(text)-filters0] FAILED                                                                                                                                            [ 27%]
tests/test_benchmark.py::test_benchmark_text_compress[deflate(text)-filters1] FAILED                                                                                                                                         [ 27%]
tests/test_benchmark.py::test_benchmark_text_compress[zstd(text)-filters2] FAILED                                                                                                                                            [ 28%]
tests/test_benchmark.py::test_benchmark_text_compress[brotli(text)-filters3] FAILED                                                                                                                                          [ 28%]
tests/test_benchmark.py::test_benchmark_text_decompress[ppmd(text)-filters0] FAILED                                                                                                                                          [ 28%]
tests/test_benchmark.py::test_benchmark_text_decompress[deflate(text)-filters1] FAILED                                                                                                                                       [ 28%]
tests/test_benchmark.py::test_benchmark_text_decompress[zstd(text)-filters2] FAILED                                                                                                                                          [ 29%]
tests/test_benchmark.py::test_benchmark_text_decompress[brotli(text)-filters3] FAILED                                                                                                                                        [ 29%]
tests/test_benchmark.py::test_benchmark_calculate_key1 SKIPPED (Don't test in ordinary development)                                                                                                                          [ 29%] 
tests/test_benchmark.py::test_benchmark_calculate_key2 SKIPPED (Don't test in ordinary development)                                                                                                                          [ 30%] 
tests/test_benchmark.py::test_benchmark_calculate_key3 SKIPPED (Don't test in ordinary development)                                                                                                                          [ 30%] 

The code

        last_param = None
        while out_remaining > 0:
            tmp = decompressor.decompress(fp, min(out_remaining, max_block_size))
            if len(tmp) > 0:
                out_remaining -= len(tmp)
                fq.write(tmp)
                crc32 = calculate_crc32(tmp, crc32)
            else:
                param = (out_remaining,
                         max_block_size,
                         crc32,
                         decompressor)
                if last_param == param:
                    break
                last_param = (out_remaining,
                              max_block_size,
                              crc32,
                              decompressor)

            if out_remaining <= 0:
                break
        if fp.tell() >= src_end:
            if decompressor.crc is not None and not decompressor.check_crc():
                raise CrcError(decompressor.crc, decompressor.digest, None)
        return crc32

If all parameters in the loop stay the same in more than 1 iter, which means it is a dead loop. The only thing I am trying to do here is break the loop if it is dead, and perhaps that's causing error? But without it the code should run forever. I'm getting confused.

@Mrw33554432
Copy link
Author

Mrw33554432 commented Nov 2, 2023

        if fp.tell() >= src_end:
            if decompressor.crc is not None and not decompressor.check_crc():
                raise CrcError(decompressor.crc, decompressor.digest, None)

The decompressor probably store something inside, and we should also check the param inside decompressor, ensuring a dead loop is happening. But I will leave it for later.

@Mrw33554432
Copy link
Author

Mrw33554432 commented Nov 2, 2023

If the past version works fine, either the task is successfully killed with timeout, or

tmp = decompressor.decompress(fp, min(out_remaining, max_block_size))

returned tmp changes even if out_remaining stay the same. That's quite confusing. Or the CRC error is supposed to happen, but due to the dead loop the error is not triggered. It' much deeper than I expected.

@miurahr
Copy link
Owner

miurahr commented Nov 2, 2023

If the past version works fine, either the task is successfully killed with timeout, or

tmp = decompressor.decompress(fp, min(out_remaining, max_block_size))

returned tmp changes even if out_remaining stay the same. That's quite confusing. Or the CRC error is supposed to happen, but due to the dead loop the error is not triggered. It' much deeper than I expected.

You can find decompressor has buffer. It is required why some algorithm library sometimes produce more length of requested data with given input data, decompressor store data for the next request.

https://github.com/miurahr/py7zr/blob/master/py7zr/compressor.py#L680-L685

…into this section twice (1 iter delayed to allow code perform eveything else it need to do)
@miurahr
Copy link
Owner

miurahr commented Nov 5, 2023

You can find decompressor has buffer. It is required why some algorithm library sometimes produce more length of requested data with given input data, decompressor store data for the next request.

https://github.com/miurahr/py7zr/blob/master/py7zr/compressor.py#L680-L685

There is another fact that some algorithm has an internal state and buffer. It accept max_length of output and it has the internal state or buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants