Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default Hypre MPI comm size (num_procs) #1210

Open
iaae opened this issue Jan 21, 2025 · 4 comments
Open

Default Hypre MPI comm size (num_procs) #1210

iaae opened this issue Jan 21, 2025 · 4 comments

Comments

@iaae
Copy link

iaae commented Jan 21, 2025

I tested the Hypre solver via the IJ linear algebra interface and am in the middle of fine-tuning it for a larger-scale test.

When Hypere is driven by a single MPI world comm size, it seems it is still using 4 processors/threads (on a 32-core dual-cpu machine). In practice, the number of processors should be controlled during the runtime, and there are a few ways to change that from the PDE solver control (e.g. spawn a hyper MPI-run, create a new comm, etc).

Before I change the code to achieve such, is there a quick way to change the default hyper 4 processor/thread control? It does not seem like there is any API for that.

Thanks!

@iaae
Copy link
Author

iaae commented Jan 26, 2025

Due to the limitation of MPI_comm_spawn() unstable behavior on the test machine (dual-xeon-16core), I switched to the hybrid OpenMP+MPI configuration and tested the Hypre IJ linear algebra interface further. Limiting OpenMP to one thread shows 4 cores running actively during the Hypre process (the original equation? How to change it?).

Here is the interesting behavior (bug?): when you set OpenMP to more than 8, the result is pretty bad (or wrong, as if the errTol is set to high, using the default BoomerAMG method). If it is within 8 threads, the result is correct.

Possibly when the total OpenMP threads exceed the max cores (32 cores), some wait state or OMP calls are not working properly.

Can someone comment on this behavior? The goal is to control/change the actual MPI process count.

thanks!

@victorapm
Copy link
Contributor

when you set OpenMP to more than 8, the result is pretty bad (or wrong, as if the errTol is set to high, using the default BoomerAMG method). If it is within 8 threads, the result is correct.

@iaae could you provide instructions to reproduce this behavior you described?

@iaae
Copy link
Author

iaae commented Jan 26, 2025

Thanks for the quick reply. Unfortunately, Hypre code is tested on a slightly larger code on a Windows platform with several different solvers for benchmark/evaluation purposes...for now. I plan to investigate different MPI/OpenMP (Intel, MS, etc) to investigate/confirm the behavior further. If confirmed, I will try duplicating the behavior using Ex5 (the only example with the IJ interface).

It would help if anyone could confirm/comment/change the behavior of running Ex5 on Win11 using either MS_MPI or Intel I_MPI. I_MPI does not allow you to specify a world size of more than 1 on a single-CPU Win platform, but MS_MPI does. In both cases, using the default one MPI_comm_size causes Hypre to run on 4 cores (on a 32-core machine) as can be observed on the CPU/core performance plot.

@victorapm
Copy link
Contributor

@iaae This behavior is likely related to the MPI or system environment, not Hypre itself. MS_MPI may be spawning threads or oversubscribing by default, while Intel I_MPI might handle threading differently. Try setting OMP_NUM_THREADS=1 explicitly before running to restrict threads. If possible, testing on Linux might confirm whether this is platform-specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants