From 9e8741969a9b4920ac55431efd8de6c6561581a0 Mon Sep 17 00:00:00 2001 From: Matt Jones Date: Sun, 24 Mar 2024 22:36:04 -0700 Subject: [PATCH] Add "Are we done yet?" section to par prog from last year. --- sections/parallel-programming.qmd | 78 +++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/sections/parallel-programming.qmd b/sections/parallel-programming.qmd index f9569ea..1a8b70b 100644 --- a/sections/parallel-programming.qmd +++ b/sections/parallel-programming.qmd @@ -247,6 +247,22 @@ run_threaded: 4440.109491348267 ms This execution took about :tada: *4 seconds* :tada:, which is about 6.25x faster than serial. Congratulations, you wrote your a multi-threaded python program! +But you may have also seen the thread pool wasn't much faster at all. In that case, you may be encountering the python Global Interpreter Lock (GIL), and would be better off with using parallel processes. Let's do that with `ProcessPoolExecutor`. + +```{python} +from concurrent.futures import ProcessPoolExecutor + +@timethis +def run_processes(task_list): + with ProcessPoolExecutor(max_workers=10) as executor: + return executor.map(task, task_list) + +results = run_processes(np.arange(10)) +[x for x in results] +``` + +For me, that took about 8 seconds, about half the time of the serial case, but results can vary tremendously depending on what others are doing on the machine. + ## Exercise: Parallel downloads In this exercise, we're going to parallelize a simple task that is often very time consuming -- downloading data. And we'll compare performance of simple downloads using first a serial loop, and then using two parallel execution libraries: `concurrent.futures` and `parsl`. We're going to see an example here where parallel execution won't always speed up this task, as this is likely an I/O bound task if you're downloading a lot of data. But we still should be able to speed things up a lot until we hit the limits of our disk arrays. @@ -458,6 +474,68 @@ htex_local.executors[0].shutdown() parsl.clear() ``` +## Are we done yet? + +Parsl and other concurrency libraries generally provide both **blocking** and **non-blocking** methods for accessing the results of asycnchronous method calls. By blocking, we are referring to methods that wait for the asynchronous opearation to complete before returning results, which blocks execution of the rest of a program. By non-blocking, we are referring to methods that return the result if it is available, but not if the async method hasn't completed yet. In that case, it lets the rest of the program to continue execution. + +In practice this means that we can either 1) wait for all async calls to complete, and then process them using the blocking methods, or 2) query with a non-blocking method to see if each async call is complete, and only then retrieve the results for that method. We illustrate this approach below with parsl. + +```{python} +# Required packages +import parsl +from parsl import python_app +from parsl.config import Config +from parsl.executors import HighThroughputExecutor +from parsl.providers import LocalProvider + +# Configure the parsl executor +activate_env = 'workon scomp' +htex_local = Config( + executors=[ + HighThroughputExecutor( + max_workers=5, + provider=LocalProvider( + worker_init=activate_env + ) + ) + ], +) +parsl.clear() +parsl.load(htex_local) +``` + +Define the task we want to run. + +```{python} +@python_app +def do_stuff(x): + import time + time.sleep(1) + return x**2 +``` + +And now execute the tasks with some sleeps to see which are blocking and which are completed. + +```{python} +import time +all_futures = [] +for x in range(0,10): + future = do_stuff(x) + all_futures.append(future) + print(future) + +time.sleep(2) + +for future in all_futures: + print("Checking: ", future) + if (future.done()): + print("Do more with result: ", future.result()) + else: + print("Sorry, come back later.") +``` + +Notice in particular that about half of the jobs are not done yet. + ## When to parallelize It's not as simple as it may seem. While in theory each added processor would linearly increase the throughput of a computation, there is overhead that reduces that efficiency. For example, the code and, importantly, the data need to be copied to each additional CPU, and this takes time and bandwidth. Plus, new processes and/or threads need to be created by the operating system, which also takes time. This overhead reduces the efficiency enough that realistic performance gains are much less than theoretical, and usually do not scale linearly as a function of processing power. For example, if the time that a computation takes is short, then the overhead of setting up these additional resources may actually overwhelm any advantages of the additional processing power, and the computation could potentially take longer!