Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leaking while extracting 40GB archive #626

Open
axet opened this issue Dec 3, 2024 · 0 comments
Open

memory leaking while extracting 40GB archive #626

axet opened this issue Dec 3, 2024 · 0 comments

Comments

@axet
Copy link

axet commented Dec 3, 2024

Describe the bug

Using py7z extracting big archive causing 30GB memory used by process.

Related issue

#575

To Reproduce

  1. get 40GB archive
  2. prepare extract python script (see below)
  3. run it
import sys
import os
import py7zr
import time
import io

class DecompressFile(io.IOBase):

  def __init__(self, zipf, zi):
    self.fp = zipf.fp
    self.src_end = zipf.afterheader + zipf.header.main_streams.packinfo.packpositions[-1]
    self.out_remaining = zi.uncompressed
    self.crc32 = 0
    self.decompressor = zi.folder.get_decompressor(zi.compressed, zi.compressed is not None)

  def read(self, size):
    m = min(self.out_remaining, py7zr.properties.get_memory_limit(), size)
    tmp = self.decompressor.decompress(self.fp, m)
    self.out_remaining -= len(tmp)
    self.crc32 = py7zr.helpers.calculate_crc32(tmp, self.crc32)
    if self.fp.tell() >= self.src_end:
      if self.decompressor.crc is not None and not self.decompressor.check_crc():
          raise py7zr.exceptions.CrcError(self.decompressor.crc, self.decompressor.digest, None)
    return tmp

def extract(ifn):
  with py7zr.SevenZipFile(ifn) as zipf:
      for zi in zipf.files:
        print (zi.filename)
        if not zi.is_directory and zi.uncompressed > 0:
          d = DecompressFile(zipf, zi)
          while d.read(1024*1024):
            pass
          if d.crc32 != zi.crc32:
            raise py7zr.exceptions.CrcError(d.crc32, zi.crc32, zi.filename)

extract("40G.7z")

Expected behavior

Memory consumed 5-10GB not 30G+

Environment (please complete the following information):

  • OS: debian trixie
  • Python 3.12.7
  • py7zr version: 0.22.0+dfsg-1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant