Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parentless sequential xtrigger spawning #5738

Merged

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Sep 20, 2023

supersedes #5732

Spawning parentless xtrigger tasks out to the run ahead limit is unnecessary in the majority of cases, and creates a lot of UI clutter.

This PR makes sequential spawning on xtrigger satisfaction for parentless tasks optional, with non-sequential the default (no change). And the default for clock triggers.

Justification:

  • Naturally clock trigger tasks are sequential in time, so it makes sense to only spawn the next parentless clock-triggered task when the current clock trigger is satisfied.
  • Most workflows are sequential, and so workflows/downstream-workflows usually don't need to check xtriggers out to the runahead limit (i.e. workflow_state).
  • It makes little difference (split seconds) in "catch-up" mode, where all xtriggers will be satisfied on first check.
  • It puts less load on the Scheduler and UI, with ~1/runahead-limit times less xtriggers to check and tasks to show.
  • If the user wants non-sequential xtrigger checking, that xtrigger can be specified as such.

At NIWA we have a modular approach to running our workflows, both in operations (around ~50 workflows, most interconnected) and in research (commonly downstream of operational workflows). So wall_clock and workflow_state are by far the most common xtriggers. We do have some requirement for non-sequential workflow_state xtriggers (i.e. satellite checking, where data may or may not come in for a cycle), however, nearly all workflows and downstream workflows are sequential.. And many of these require very large runahead limits.

Take the following simple example;

[scheduling]
    initial cycle point = 20230920T0530Z
    runahead limit = P5
    [[xtriggers]]
        clock_1 = wall_clock()
        up_1 = workflow_state(workflow=clockspawn, task=b, point=%(point)s):PT7S
    [[graph]]
        PT1M = """
@clock_1 => a
a => b
@up_1 => c
"""

[runtime]
    [[root]]
        script = sleep 2
    [[a,b,c]]

Currently that produces:
image

With this default behavior changed;

[scheduling]
    initial cycle point = 20230920T0530Z
    runahead limit = P5
    sequential xtriggers = True
    [[xtriggers]]
        clock_1 = wall_clock()
        up_1 = workflow_state(workflow=clockspawn, task=b, point=%(point)s):PT7S
.
.
.

it is reduced to:
image

Real workflows commonly have way more tasks (branches length and/or parallel) and way larger runahead limits, so the efficiency and clutter-less gains are massive.

And if for some reason users want some xtriggers to be checked/run out into the future/RH-limit in a non-sequential manner, then they can override the default with an argument (i.e. sequential=True/False):

    sequential xtriggers = True
    [[xtriggers]]
        clock_1 = wall_clock()
        up_1 = workflow_state(workflow=clockspawn, task=b, point=%(point)s, sequential=False):PT7S
.
.
.

image
(if there's multiple xtriggers on a single task it upholds the sequential setting if any)

This sequential argument name is reserved, and not passed onto the xtrigger module/function.
It can be defined in the xtrigger module/function (as in workflow declaration), but must have a Boolean default:
image
(i.e. from modifying cylc.flow.xtriggers.workflow_state to include settings arg in function definition)
cylc.flow.xtriggers.wall_clock.wall_clock has sequential=True set

On xtrigger satisfaction the corresponding parentless sequential xtriggered task (PSXT) will spawn out to the RH limit, or next occurrence of a PSXT, or non-parentless task occurrence.

Note:
Tui sometimes shows no tasks before a corresponding xtrigger is satisfied. This problem is not reflected in the data-store or WUI.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.

@dwsutherland dwsutherland added efficiency For notable efficiency improvements sod-follow-up labels Sep 20, 2023
@dwsutherland dwsutherland added this to the cylc-8.3.0 milestone Sep 20, 2023
@dwsutherland dwsutherland self-assigned this Sep 20, 2023
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch from d9fc5a5 to 209e621 Compare September 20, 2023 05:46
@oliver-sanders
Copy link
Member

Note, the Cylc 7 spawning behaviour can be achieved by adding submit dependencies between the parentless tasks e.g:

@up => a
a[-PT1M]:submitted => a

This forces the sequentiality of a without prohibiting multiple instances of a from running in parallel.

@dwsutherland
Copy link
Member Author

dwsutherland commented Sep 21, 2023

Note, the Cylc 7 spawning behaviour can be achieved by adding submit dependencies between the parentless tasks e.g:

@up => a
a[-PT1M]:submitted => a

This forces the sequentiality of a without prohibiting multiple instances of a from running in parallel.

IMO that introduces clutter and false dependencies into the workflow definition.. I would have to do that roughly thousands of times across all 50 workflows.. (a number of these workflows query all other workflows' tasks via workflow_state)
(by false dependency I mean; you desire xtriggers not to run all at once to RH and/or reduction of UI clutter (changing the spawning behavior), you don't want a to depend on previous a in any way)

Also some tasks might be in queues or tight job scheduler queues (and require submission retries)..
(it would also be in the visible in the graph window, and I kinda like the SoD Cylc8 minimalist behavior)

I have a change inbound, like what we messaged about in Element, where this will be solved without effecting the current behavior.. Using one line of workflow definition ..

@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch 2 times, most recently from 2f57d84 to ba46897 Compare September 21, 2023 07:40
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense but haven't thought through the details as yet. Because this alters the spawning logic it does require a good amount of thought to ensure that it doesn't interact with other features. Here are some things this change could potentially interact with off the top of my head:

  • Trigger - What happens if you manually trigger a task before the xtrigger is satisfied? If we're not careful, could this result in the task being removed from the workflow Cylc 7 style?
  • Shutdown - We would also need to make sure that the task pool cant become empty as a result of delayed spawning as this could interact with shutdown logic.
  • Remove - Could removing one instance of the task result in the task being removed from the workflow Cylc 7 style? Note cylc remove will work on active tasks soon.
  • Flows - Could the delayed spawning change the behaviour of flow merging in unexpected ways?

cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch 4 times, most recently from 0f71185 to 2251aef Compare September 22, 2023 00:55
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch from 2251aef to 0c80713 Compare October 6, 2023 08:10
@dwsutherland
Copy link
Member Author

Restart/reload and remove are now handled..

Remove will cause the next occurrence to spawn, the argument being:

  • the non-sequential case would be spawned, so be consistent.
  • otherwise the workflow would shutdown if it's the only task (and we have a stop button for that).

@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch 2 times, most recently from 89957d6 to bf54c77 Compare October 11, 2023 05:06
@dwsutherland
Copy link
Member Author

I assume the docs are auto-generated from the config definition.. but I suppose I should build it with this branch to find out.

@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch 2 times, most recently from 7262f4c to 116f297 Compare October 31, 2023 21:15
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch 2 times, most recently from 68b1212 to 44fbc89 Compare November 20, 2023 23:13
@oliver-sanders oliver-sanders requested review from MetRonnie and removed request for hjoliver November 22, 2023 15:31
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a couple of functional tests, works nicely.

When a workflow is rapidly catching up, clock-triggered tasks get pushed through in batches of the runahead limit which is a slight quirk, but nothing we can't live with.

This logic needs more tests, particularly covering interactions with this logic, e.g. trigger, remove, reload, etc. These interactions are easy to get wrong and really easy to break by accident.

I've made a start at some integration tests and commented a couple of suggestions for new ones. Two of my tests currently fail and will need a look in:

# THIS FILE IS PART OF THE CYLC WORKFLOW ENGINE.
# Copyright (C) NIWA & British Crown (Met Office) & Contributors.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

"""Test interactions with sequential xtriggers."""

import pytest

from cylc.flow.cycling.iso8601 import ISO8601Point


@pytest.fixture()
def sequential(flow, scheduler):
    id_ = flow({
        'scheduler': {
            'allow implicit tasks': 'True',
            'cycle point format': 'CCYY',
        },
        'scheduling': {
            'runahead limit': 'P2',
            'initial cycle point': '2000',
            'graph': {
                'P1Y': '@wall_clock => foo',
            }
        }
    })

    sequential = scheduler(id_)

    def list_tasks():
        """List the task instance cycle points present in the pool."""
        nonlocal sequential
        return sorted(itask.tokens['cycle'] for itask in sequential.pool.get_all_tasks())

    sequential.list_tasks = list_tasks

    return sequential


async def test_remove(sequential, start):
    """It should spawn the next instance when a task is removed.

    Ensure that removing a task with a sequential xtrigger does not break the
    chain causing future instances to be removed from the workflow.
    """
    async with start(sequential):
        # the scheduler starts with one task in the pool
        assert sequential.list_tasks() == ['2000']

        # it sequentially spawns out to the runahead limit
        for year in range(2000, 2010):
            foo = sequential.pool.get_task(ISO8601Point(f'{year}'), 'foo')
            if foo.state(is_runahead=True):
                break
            sequential.xtrigger_mgr.call_xtriggers_async(foo)
            sequential.pool.spawn_parentless_sequential_xtriggers()
        assert sequential.list_tasks() == [
            '2000',
            '2001',
            '2002',
            '2003',
        ]

        # remove all tasks in the pool
        sequential.pool.remove_tasks(['*'])

        # the next cycle should be automatically spawned
        assert sequential.list_tasks() == ['2004']

        # TODO: This usually fails because each task that is removed spawns its
        # next instance, so depending on the order they are removed in, some
        # tasks get removed then re-spawned again.
        # NOTE: You won't spot this issue in a functional test because the
        # re-spawned tasks are detected as completed and automatically removed.
        # So ATM not dangerous, but potentially inefficient.


async def test_trigger(sequential, start):
    """It should spawn its next instance if triggered ahead of time.

    If you manually trigger a sequentially spawned task before its xtriggers
    have become satisfied, then the sequential spawning chain is broken.

    The task pool should defend against this to ensure that triggering a task
    doesn't cancel it's future instances.
    """
    async with start(sequential):
        assert sequential.list_tasks() == ['2000']

        foo = sequential.pool.get_task(ISO8601Point('2000'), 'foo')
        sequential.pool.force_trigger_tasks([foo.identity], {1})
        foo.state_reset('succeeded')
        sequential.pool.spawn_on_output(foo, 'succeeded')

        assert sequential.list_tasks() == ['2001']

        # TODO: This test fails with an empty pool suggesting that triggering
        # could result in premature shutdown?


async def test_reload(sequential, start):
    """It should set the is_xtrigger_sequential flag on reload.

    TODO: test that changes to the sequential status in the config get picked
          up on reload
    """
    async with start(sequential):
        # the task should be marked as sequential
        pre_reload = sequential.pool.get_task(ISO8601Point('2000'), 'foo')
        assert pre_reload.is_xtrigger_sequential is True

        # reload the workflow
        sequential.pool.reload_taskdefs(sequential.config)

        # the original task proxy should have been replaced
        post_reload = sequential.pool.get_task(ISO8601Point('2000'), 'foo')
        assert id(pre_reload) != id(post_reload)

        # the new task should be marked as sequential
        assert post_reload.is_xtrigger_sequential is True


# TODO: test that a task is marked as sequential if any of its xtriggers are
# sequential (as opposed to all)?

# TODO: test setting the sequential argument in [scheduling][xtrigger] items
# changes the behaviour

# TODO: test the interaction between "sequential xtriggers default" and the
# sequential argument to [scheduling][xtrigger]
# * Should we be able to override the default by setting sequential=False?
# * Or should that result in a validation error?

cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch from 44fbc89 to 12db81d Compare November 29, 2023 07:46
Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spotted 1 mistake in the conflict resolution but otherwise looks good to me

cylc/flow/task_pool.py Outdated Show resolved Hide resolved
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch from e9ce3ff to 3da7348 Compare March 20, 2024 02:08
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part-way through... a few minor comments so far.

changes.d/5738.feat.md Outdated Show resolved Hide resolved
cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/config.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/xtrigger_mgr.py Outdated Show resolved Hide resolved
cylc/flow/xtrigger_mgr.py Outdated Show resolved Hide resolved
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch from 3da7348 to f09ea62 Compare March 21, 2024 01:10
@dwsutherland dwsutherland force-pushed the parentless-sequential-xtrigger-spawning branch from f09ea62 to 63724c9 Compare March 21, 2024 01:17
@hjoliver
Copy link
Member

Testing ... looks good so far, nice change.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @dwsutherland

@hjoliver
Copy link
Member

@MetRonnie - the only code changes after my review were a type annotation and config item name (which wasn't formally agreed on anyway), so no need for you to re-review.

@hjoliver hjoliver merged commit 82bd7cc into cylc:master Mar 21, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config change Involves a change to global or workflow config efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants