Gas Cooling on GPU #185

spencerw · 2024-10-07T18:00:43Z

A few of the cooling modules (boley, cosmo, metal and h2) use a stiff ODE solver (StiffStep), which creates a significant bottleneck during updateuDot. This PR introduces CudaStiffStep, which is a GPU version of the solver. With the CUDA flag enabled, the ODE integration now happens for all particles on a given TreePiece in parallel. The parameter 'nGpuGasMinParts' can be used to direct TreePieces with small particle counts to do the integration on the CPU.

This required making a few significant structural changes to the code:

To minimize code duplication, __device__ __host__ specifiers have been added to many of the cooling subroutines. An empty .cu file has been added to each of the cooling modules, which allows the old C code to be used on the GPU. When the CUDA flag is enabled, these new .cu files are compiled separately for the host and device using the '-dc' flag and then linked together in a separate step at the end of the cuda.mk file. Also note that the clDerivs function for cosmo cooling makes use of RootFind, which required making some of the routines from stiff.c accessible from the device as well.
The parallel nature of CudaStiffStep requires a separate clDerivsData and Stiff struct (along with space for the associated deep pointers) for each of the gas particles on both the host and device side. Originally, allocation for this data was handled by the TreePieces from within the cooling subroutines. This is now handled by the DataManager (allocCoolParticleBlock), which then assigns blocks of pre-allocated host and device memory to the TreePieces (setCoolPtrs). In the event that the gas particle count increases significantly, a larger block of memory is then re-allocated.

A test suite 'test_cooling' for the different cooling modules is also included.

…each particle

Free all device data at end of simulation

…host fn's

…lback

…vice

spencerw · 2024-10-07T18:07:41Z

Note that 'grackle' and 'planet' cooling still need to be tested and updated. Although they don't make use of StiffStep, changes to updateuDot and the way memory is managed have created some incompatibilities.

It looks like a few tweaks need to be made to get this to work without the CUDA or cooling flags as well.

spencerw added 30 commits May 14, 2024 21:10

Basic cooling kernel function

4128023

Split updateuDot loop

96d82a1

Split up cooling function, add placeholder for kernel launch

3200d91

Pass necessary variables from Start->StiffStep->Finish

1f9ecb6

Copy integrator data to device

25af77e

Cooling data written to GPU memory

947a7ee

Start fixing CUDA memory errors

5ad8470

CoolSetTime not updating GPU memory

1c61ec8

Use std::vector for memory allocation when possible

06b2c76

Merge gpu-cpu changes from upstream/master

a018262

Fix more memory issues

7d4e5f2

UV table data not copied correctly

c89eab8

Fix more memory issues. Separate cooldata and integrator context for …

1f333ec

…each particle

Minimal working GPU version of cooling

0d1fe3b

Pass entire dtg vector to TreePieceODESolver

e38afa2

Derivative data properly passed between host and device

69b89cf

Init y_min before moving to GPU

c877f5e

Clean up integrator contexts in updateuDot

dc26937

GPU memory allocation only happens once

5959d1e

Dependent integrator variable allocated at start of sim

c5c339e

GPU memory mapping for updateuDot depends on particle count per TP

f5a64e3

TreePiece callbacks from DataManager managed correctly

65bc9f8

Temporary fix for registeredTreePieces

a6bfac8

Ensure cudaMemcpy happens inside streams

c630c7e

Free all device data at end of simulation

Pass active rung to calculateNumActiveGasParticles

9db325e

Fix hangs during multistepping

3f78595

Allocate GPU memory for cooling based on particle count

04e1f8f

Solve ODE on CPU if gas particle count is small

d90ec98

Code cleanup

93519a4

Reduce memory footprint of CudaStiffStep

4b4856c

spencerw added 25 commits September 17, 2024 09:42

cooling_metal_H2 code used on host and device

e5c381d

Clean up updateuDot

4aacf20

Superbubble heating for GPU ODE solver

a9786b5

cooling_metal on GPU

cc06ad5

cooling_cosmo on GPU

0a5ce1c

cooling_boley on GPU

fbda0d0

Incorporate superbubble heating code

0f70068

Use __constant__ to make global variables accessible from device and …

94b5c75

…host fn's

clearRegisteredPieces called before tree build

e2393c7

CUDA streams assigned to TreePieces after tree build

7dec4c5

Move clearRegisteredPieces inside buildTree

ba603d6

Remove unused code

8bbeccb

Merge remote-tracking branch 'remotes/origin/dm_tp' into gpu_cool_rol…

a9152dd

…lback

Move assignCUDAStreams() call inside of buildTree()

049e4fa

Merge remote-tracking branch 'origin/dm_tp' into gpu_cool

b14d818

Fixes to superbubble heating loop

83aa9c6

Cooling shutoff directly flags CoolData

efa5fc0

EPSINTEG defined in cooling header files

232c779

Remove duplicated constants for stiff integrator init

a0fcc4f

Dimensionality of integration set by COOL_NV

67e9046

Get rid of macros above CudaStiffStep

4605093

Ensure constants in cooling_metal modules defined on both host and de…

d4aea3b

…vice

Remove CUDA stream assignment in updateuDot

bc79360

Fixes to COOLING_BOLEY

54ff52d

COOL struct transferred to GPU after CoolTableReadInfo

dea5d46

spencerw added 3 commits October 7, 2024 13:21

Use CoolDerivsInit and StiffInit for host data arrays

a157074

Skip energy update for cosmo cooling if tStep < 0

77b7e11

Add test_cooling

3c42666

spencerw force-pushed the gpu_cool branch from a3841c2 to 3c42666 Compare October 7, 2024 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gas Cooling on GPU #185

Gas Cooling on GPU #185

spencerw commented Oct 7, 2024 •

edited

Loading

spencerw commented Oct 7, 2024 •

edited

Loading

Gas Cooling on GPU #185

Are you sure you want to change the base?

Gas Cooling on GPU #185

Conversation

spencerw commented Oct 7, 2024 • edited Loading

spencerw commented Oct 7, 2024 • edited Loading

spencerw commented Oct 7, 2024 •

edited

Loading

spencerw commented Oct 7, 2024 •

edited

Loading