Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamicSmagorinsky is _very_ slow using default arguments and sometimes produces NaNs #4069

Open
tomchor opened this issue Jan 28, 2025 · 11 comments

Comments

@tomchor
Copy link
Collaborator

tomchor commented Jan 28, 2025

I've started trying to use the Lagrangian-averaged version of the Dynamic Smagorinsky in simulations with complex bathymetry and it is way slower than I had timed it in #3642. Back then iirc I timed the dynamic version as being about 4x slower than the constant version. In my current simulation it is about 12x times slower! Furthermore, it sometimes produces NaNs and I haven't been able to figure out why.

I haven't had a chance to investigate this much more for now, but I thought I'd post it here in case someone has experienced this and wants to chime in. I'm planning on writing a MWE sometime soon.

The good news is that results using the much simpler AMD are very close to those using dynamic Smag. At least for simulations where we AMD doesn't produce any grid-scale noise. I think this is some serious incentive to fix that issue on AMD.

@glwagner
Copy link
Member

Can you post an MWE that people might use for benchmarking?

@tomchor tomchor changed the title DynamicSmagorinsky is _very_ slow and sometimes produces NaNs DynamicSmagorinsky is _very_ slow using default arguments and sometimes produces NaNs Jan 31, 2025
@tomchor
Copy link
Collaborator Author

tomchor commented Jan 31, 2025

Okay, turns out that I forgot to set the schedule flag in the DynamicCoefficient. My bad. The default is to update it at every time step, which causes the severe slowdown of 10+ times. But if I set it to update every 5 time steps, the slowdown is around 4x, which is what I had measured. Virtually everyone updates the coefficient every 5-ish time steps and it's been shown to not degrade the results. Should we set the default value of schedule to iterationInterval(5)?

That said, the simulation still blows up after some time, which is something that needs to be investigated.

@glwagner
Copy link
Member

I think that opinionated defaults are a little weird. I do think its good to target convenience but we also usually use that reasoning to decide between two defaults with otherwise similar meanings, like AB2 vs RK3. I'm not sure how we could generically justify IterationalInterval(2), IterationalInterval(4), IterationalInterval(8) --- is one more optimal than the other?

A consideration that we often think takes precedence over convenience is self-documentation of scripts. For example, it may be important that one can find things like IterationInterval(4) in published scripts.

Side note that 5 is a weird number. Do we choose that because its 10/2, and 10 is a nice round number? Why not 4 or 8 (following a strategy of power-law-based trial-and-error: 1, 2, 4, 8, 16, etc).

@tomchor
Copy link
Collaborator Author

tomchor commented Jan 31, 2025

Yeah I agree with you that arbitrary numbers for defaults are in general not good. However, the closure is pretty much unusable with its current default. It's simply way too slow, even for small grids. So I think this is a case where an exception in warranted.

About the actual choice of interval, I'm good with anything that would give a reasonable performance. I mentioned 5 because Elie used it in his 2005 paper after testing it and verifying that results didn't change much. Since then everyone uses 5 because there's a reference to back that number up.

@glwagner
Copy link
Member

glwagner commented Feb 1, 2025

I suggest using IterationInterval(1) as default and putting an example in the docstring that illustrates setting it to 4 or 8 with a discussion about the trade-offs.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 2, 2025

I still have the opinion that the priority is that the default options should produce a generally usable configuration, which at the moment (unless for perhaps very small grids) I'd argue is not the case with IterationInterval(1).

But I'll open a PR to include the changes you suggested because that's better than what's currently there now (which is no mention of this).

@glwagner
Copy link
Member

glwagner commented Feb 3, 2025

Hm ok. We should revisit defaults for NonhydrostaticModel and others too though in that case, right?

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 3, 2025

I don't know if the defaults for NonhydrostaticModel are bad. Honestly it always works pretty well out of the box for me! But if you think there are improvements to be made there, I'm happy to contribute to that discussion.

I just mentioned the specific case for DynamicCoefficient with IterationInterval(1) because it's an order of magnitude slower than other closures, rendering it pretty unusable by comparison most of the time.

@glwagner
Copy link
Member

glwagner commented Feb 3, 2025

Well, out of the box it's second-order advection with no tracers or closure or anything, so one always has to add something. But the idea we had is a very neutral default is a good choice if our objective is to make it so that user scripts are maximally descriptive.

@glwagner
Copy link
Member

glwagner commented Feb 3, 2025

I think it really boils down to whether you want people reading user scripts to know what was chosen for the stride. My opinion is yes, we do want people to know, because it likely affects the solution (most especially when pushing CFL, or in situations with very complex bathymetry). Revealing this choice to future users will increase the quality of science because it will allow people to confront this compromise rather than pushing it under the hood.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 4, 2025

That's a good point. Although one could make that argument for other things that are currently general-purpose defaults, like the model's time-stepper...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants