Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max threads number when extracting non-solid files. #630

Open
bpinsard opened this issue Jan 14, 2025 · 2 comments
Open

Max threads number when extracting non-solid files. #630

bpinsard opened this issue Jan 14, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@bpinsard
Copy link

Is your feature request related to a problem? Please describe.

When extracting 7z files compressed in non-solid mode (1 block per file) with numerous files (100k+), the number of threads launched in

py7zr/py7zr/py7zr.py

Lines 1314 to 1332 in 7aa9af8

for i in range(numfolders):
if skip_notarget:
if not any([self.target_filepath.get(f.id, None) for f in folders[i].files]):
continue
p = self.concurrent(
target=self.extract_single,
args=(
filename,
folders[i].files,
path,
self.src_start + positions[i],
self.src_start + positions[i + 1],
q,
exc_q,
skip_notarget,
),
)
p.start()
concurrent_tasks.append(p)

can be prohibitive, even reaching system limits and crashing.

I expect that launching that many threads is even detrimental to performance.

Describe the solution you'd like
Would it be possible to have a sensible max threads setting?

@miurahr miurahr added the enhancement New feature or request label Jan 15, 2025
@miurahr
Copy link
Owner

miurahr commented Jan 15, 2025

py7zr/py7zr/py7zr.py

Lines 1268 to 1271 in 7aa9af8

if mp:
self.concurrent: Union[type[Thread], type[Process]] = Process
else:
self.concurrent = Thread

We may be able to add some initialization for the class declared here.
One idea is using concurrent.futures.ThreadPoolExecutor instead of Thread

Caution: The self.concurrent be also able to become Process in multiprocess environment.

Do you have any ideas?

@bpinsard
Copy link
Author

Thanks for the fast reply!
Sounds like having ThreadPoolExecutor or ProcessPoolExecutor depending on parameters mp should allow limiting the number of concurrent threads/processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants