Prevent Thread Oversubscription in Cluster Setting #4

gordonkoehn · 2024-10-15T08:22:38Z

Aim: Prevent thread oversubscription in cluster settings in deconvolution by explicit threading control.

The nested parallelism that

Parallelise the Deconvolution #2

Introduced leads to fantastic speedups yet may lead to an oversubscription of threads in a cluster setting where the number of threads is rigorously enforced.

The nested parallelism is due to the python.multiprocessing and the nested scipy.optimise within the deconvolution/regression. The internal scipy.optimise is known to grab threads in such a setting, which may lead to a stalling on the cluster.

Objectives:

allow for common usage run with --cpus-per-task
allow for usage on end-user devoces
design user-friendly control of threads - NOT NEEDED ANYMORE (default of threads=1 works well on cluster and local)

gordonkoehn · 2024-10-15T14:37:22Z

8 threads are used by the subpackages

with controller.limit(limits=1, user_api='blas'):
deconvolution
lollipop runs fast again.

yet the number of threas is still way over 8

Need to understand this better for good command line arguments.

gordonkoehn · 2024-10-16T12:05:20Z

In the current state, we get an average time of 20-30 seconds per bootstrap over the date range on the test date, given I submitted the job with

sbatch --mail-type=END --ntasks=1 --cpus-per-task=8 --mem-per-cpu=8000 ....

This is about the same runtime I have on my local machine where threads=1 for all libraries as numpy and scipy.

I've wrapped only the handful of lines that do the actual deconvolution in the threadpoolctl controller to limit threads of here in particular scipy.optimize()

From the top readout, it looks like there are still numerous other threads spawned; there are probably due to the other usages of numpy through the script. From testing, it seems this is not a problem, though, as these are all just quick, small operations.

gordonkoehn · 2024-10-16T12:07:53Z

The thread control is currently configured for open-blas as is run on Euler for thread management.

When run locally on the end user machine - e.g. my OSX Machine - the control does nothing. So, Lollipop runs fine.

gordonkoehn · 2024-10-16T12:14:26Z

Should we allow for fine-grained thread control on the user-side ?

Potential Benefits:

speedup on cluster by multithreading of scipy

Negative Impact:

added Complexity, rather hide from User

==== Testing runtime with thread=1 ====
for bootstraps=100

Ivan's runtime for this weeks processing with cpus-per-task=32 and threads=8 was:
40 minutes / (8x parallel processing with Gordon's optimisation + -SciPy doing some ~3x-5x OpenMP stuff under the hood)

Conclusion

On the test data bootstraps=100 only takes max 30 minutes with thread =1 and cpus-per-task=8 and ntask=1, so there is no reason to waste threads on euler.

Let's stick with threads=1 and hide this complexity from the user.

In a cluster setting, thread oversubscription can lead to significant performance degradation and resource contention for running the deconvolution with scipy.optimize. This commit addresses this issue by utilizing the `threadpoolclt` library to limit the number of threads to 1. This change ensures that each process uses only the allocated resources, preventing contention and improving overall cluster stability.

gordonkoehn · 2024-10-24T08:36:35Z

@DrYak Let's merge ?

gordonkoehn added the enhancement New feature or request label Oct 15, 2024

gordonkoehn self-assigned this Oct 15, 2024

gordonkoehn marked this pull request as ready for review October 16, 2024 12:19

gordonkoehn force-pushed the oversubscription branch from d2587fd to c2dcbc1 Compare October 16, 2024 12:24

gordonkoehn force-pushed the oversubscription branch from c2dcbc1 to aa830bf Compare October 16, 2024 12:26

gordonkoehn requested a review from DrYak October 16, 2024 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent Thread Oversubscription in Cluster Setting #4

Prevent Thread Oversubscription in Cluster Setting #4

gordonkoehn commented Oct 15, 2024 •

edited

Loading

gordonkoehn commented Oct 15, 2024

gordonkoehn commented Oct 16, 2024

gordonkoehn commented Oct 16, 2024

gordonkoehn commented Oct 16, 2024 •

edited

Loading

gordonkoehn commented Oct 24, 2024

Prevent Thread Oversubscription in Cluster Setting #4

Are you sure you want to change the base?

Prevent Thread Oversubscription in Cluster Setting #4

Conversation

gordonkoehn commented Oct 15, 2024 • edited Loading

Objectives:

gordonkoehn commented Oct 15, 2024

gordonkoehn commented Oct 16, 2024

gordonkoehn commented Oct 16, 2024

gordonkoehn commented Oct 16, 2024 • edited Loading

Should we allow for fine-grained thread control on the user-side ?

Potential Benefits:

Negative Impact:

Conclusion

gordonkoehn commented Oct 24, 2024

gordonkoehn commented Oct 15, 2024 •

edited

Loading

gordonkoehn commented Oct 16, 2024 •

edited

Loading