Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local manager does not limit the number of running tasks #279

Open
RaphaelRobidas opened this issue Nov 26, 2023 · 0 comments
Open

Local manager does not limit the number of running tasks #279

RaphaelRobidas opened this issue Nov 26, 2023 · 0 comments

Comments

@RaphaelRobidas
Copy link

When running a flow (for example, this one), the manager configuration seems to be read correctly, but the hardware limits are completely disregarded.

I use the following manager.yml file:

qadapters:

- priority: 1
  queue:
      qtype: shell 
      qname: localhost

  job:
      mpi_runner: "mpirun"
      mpi_runner_options: "--bind-to none"
      omp_env: {"OMP_NUM_THREADS": 1}

  limits:
      timelimit: 1:00:00
      max_cores: 1

  hardware:
      num_nodes: 1
      sockets_per_node: 1
      cores_per_socket: 1
      mem_per_node: 1 Gb

When launching a flow with abirun.py FLOW_DIR scheduler, the initial printout contains this, confirming that the specifications have been ready correctly:

AbiPy Manager:
[Qadapter 0]
ShellAdapter:localhost
Hardware:
   num_nodes: 1, sockets_per_node: 1, cores_per_socket: 1, mem_per_node 1024,
{'OMP_NUM_THREADS': 1}
Qadapter selected: 0

However, more than one task will run at once, although only one core should be available. This is very problematic for larger flows where the number of tasks is much greater than the number of cores, causing the machine to crash.

Abipy version: git+https://github.com/abinit/abipy@59e3816168ea11ab90622531948d0dac4cd31df9
OpenMPI version: 4.1.6
OS: Debian Trixie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant