Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Thread Oversubscription in Cluster Setting #4

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

gordonkoehn
Copy link
Contributor

@gordonkoehn gordonkoehn commented Oct 15, 2024

Aim: Prevent thread oversubscription in cluster settings in deconvolution by explicit threading control.

The nested parallelism that

Introduced leads to fantastic speedups yet may lead to an oversubscription of threads in a cluster setting where the number of threads is rigorously enforced.

The nested parallelism is due to the python.multiprocessing and the nested scipy.optimise within the deconvolution/regression. The internal scipy.optimise is known to grab threads in such a setting, which may lead to a stalling on the cluster.

Objectives:

  • allow for common usage run with --cpus-per-task
  • allow for usage on end-user devoces
  • design user-friendly control of threads - NOT NEEDED ANYMORE (default of threads=1 works well on cluster and local)

@gordonkoehn gordonkoehn added the enhancement New feature or request label Oct 15, 2024
@gordonkoehn gordonkoehn self-assigned this Oct 15, 2024
@gordonkoehn
Copy link
Contributor Author

8 threads are used by the subpackages

with controller.limit(limits=1, user_api='blas'):
deconvolution
lollipop runs fast again.

yet the number of threas is still way over 8

Need to understand this better for good command line arguments.

@gordonkoehn
Copy link
Contributor Author

In the current state, we get an average time of 20-30 seconds per bootstrap over the date range on the test date, given I submitted the job with

sbatch --mail-type=END --ntasks=1 --cpus-per-task=8 --mem-per-cpu=8000 ....  

This is about the same runtime I have on my local machine where threads=1 for all libraries as numpy and scipy.

I've wrapped only the handful of lines that do the actual deconvolution in the threadpoolctl controller to limit threads of here in particular scipy.optimize()

From the top readout, it looks like there are still numerous other threads spawned; there are probably due to the other usages of numpy through the script. From testing, it seems this is not a problem, though, as these are all just quick, small operations.

Screenshot 2024-10-16 at 13 58 19

@gordonkoehn
Copy link
Contributor Author

The thread control is currently configured for open-blas as is run on Euler for thread management.

When run locally on the end user machine - e.g. my OSX Machine - the control does nothing. So, Lollipop runs fine.

@gordonkoehn
Copy link
Contributor Author

gordonkoehn commented Oct 16, 2024

Should we allow for fine-grained thread control on the user-side ?

Potential Benefits:

  • speedup on cluster by multithreading of scipy

Negative Impact:

  • added Complexity, rather hide from User

==== Testing runtime with thread=1 ====
for bootstraps=100

Ivan's runtime for this weeks processing with cpus-per-task=32 and threads=8 was:
40 minutes / (8x parallel processing with Gordon's optimisation + -SciPy doing some ~3x-5x OpenMP stuff under the hood)

Conclusion

On the test data bootstraps=100 only takes max 30 minutes with thread =1 and cpus-per-task=8 and ntask=1, so there is no reason to waste threads on euler.

Let's stick with threads=1 and hide this complexity from the user.

@gordonkoehn gordonkoehn marked this pull request as ready for review October 16, 2024 12:19
In a cluster setting, thread oversubscription can lead to significant
performance degradation and resource contention for running the deconvolution
with scipy.optimize.  This commit addresses this issue by utilizing
the `threadpoolclt` library to limit the number of threads to 1.

This change ensures that each process uses only the allocated resources,
preventing contention and improving overall cluster stability.
@gordonkoehn
Copy link
Contributor Author

@DrYak Let's merge ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant