Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dws: Fix NnfDataMovement logging #185

Merged
merged 1 commit into from
Jul 24, 2024
Merged

Conversation

bdevcich
Copy link
Collaborator

@bdevcich bdevcich commented Jul 22, 2024

Problem: No NnfDataMovement resources are found when dumping to the
journal. The current implementation is only looking at the default
namespace instead of all namespaces. Additionally, the output is valid
json.

  • Changed the api call to use list_cluster_custom_object to retrieve
    objects from all namespaces
  • Deleted DATAMOVEMENT_CRD since namespace cannot be used for
    list_cluster_custom_object
  • Moved NnfDatamovement retrieval to before the transition to
    TearDown. Otherwise these will start to be removed.

if LOGGER.isEnabledFor(logging.INFO):
try:
api_response = k8s_api.list_namespaced_custom_object(
*DATAMOVEMENT_CRD,
# TODO: There's a CRD() for this that includes namespace, but we want all namespaces
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameshcorbett: can we define DATAMOVEMENT_CRD without namespace to get what we want here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking me A) whether I'll be mad at you if you define DATAMOVEMENT_CRD without namespace or B) would it actually work with the kubernetes library? I'm not sure about B), I'm going to test it out but see also https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#list_cluster_custom_object , if you know kubernetes HTTP requests (I don't) I think the answer should be there. Have you tested this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not 100% sure I understand the TODO comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both, really :). We don't want to (and can't) use namespace here since we need all of them. I was curious if I could get away with not defining it in the CRD() tuple, but hadn't tried it.

After some playing around, it seems like you need to define namespace with CRD(), so I'll remove it and not use it here.

group="dataworkflowservices.github.io",
version="v1alpha2",
group="nnf.cray.hpe.com"
version="v1alpha1",
namespace="default",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. Can we just drop this so it's not limited to just 1 namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if we end up going the route of using list_cluster_custom_object then I would delete DATAMOVEMENT_CRD and the import of it in coral2_dws.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we end up keeping it, there's a comma missing after the group=... line.

Copy link
Member

@jameshcorbett jameshcorbett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Some small comments.

Flux uses black to format Python code. You can install it with pip install black if you're in a python virtual environment and then run black src/modules/coral2_dws.py. This is what I got assuming we delete DATAMOVEMENT_CRD:

diff --git a/src/modules/coral2_dws.py b/src/modules/coral2_dws.py
index ea1a0a7..a3b7a75 100755
--- a/src/modules/coral2_dws.py
+++ b/src/modules/coral2_dws.py
@@ -28,13 +28,7 @@ from flux.hostlist import Hostlist
 from flux.job.JobID import id_parse
 from flux.constants import FLUX_MSGTYPE_REQUEST
 from flux.future import Future
-from flux_k8s.crd import (
-    WORKFLOW_CRD,
-    RABBIT_CRD,
-    COMPUTE_CRD,
-    SERVER_CRD,
-    DATAMOVEMENT_CRD,
-)
+from flux_k8s.crd import WORKFLOW_CRD, RABBIT_CRD, COMPUTE_CRD, SERVER_CRD
 from flux_k8s.watch import Watchers, Watch
 from flux_k8s import directivebreakdown
 
@@ -162,7 +156,6 @@ def move_workflow_to_teardown(handle, winfo, k8s_api, workflow=None):
         )
     if LOGGER.isEnabledFor(logging.INFO):
         try:
-            # TODO: There's a CRD() for this that includes namespace, but we want all namespaces
             api_response = k8s_api.list_cluster_custom_object(
                 group="nnf.cray.hpe.com",
                 version="v1alpha1",
@@ -181,7 +174,9 @@ def move_workflow_to_teardown(handle, winfo, k8s_api, workflow=None):
         else:
             for crd in api_response["items"]:
                 LOGGER.info(
-                    "Found nnfdatamovement crd for workflow '%s': %s", winfo.name, json.dumps(crd)
+                    "Found nnfdatamovement crd for workflow '%s': %s",
+                    winfo.name,
+                    json.dumps(crd),
                 )
     try:
         workflow["metadata"]["finalizers"].remove(_FINALIZER)
diff --git a/src/python/flux_k8s/crd.py b/src/python/flux_k8s/crd.py
index 52cc234..21cb8b2 100644
--- a/src/python/flux_k8s/crd.py
+++ b/src/python/flux_k8s/crd.py
@@ -36,10 +36,3 @@ SERVER_CRD = CRD(
     namespace="default",
     plural="servers",
 )
-
-DATAMOVEMENT_CRD = CRD(
-    group="nnf.cray.hpe.com"
-    version="v1alpha1",
-    namespace="default",
-    plural="nnfdatamovements",
-)

group="dataworkflowservices.github.io",
version="v1alpha2",
group="nnf.cray.hpe.com"
version="v1alpha1",
namespace="default",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if we end up going the route of using list_cluster_custom_object then I would delete DATAMOVEMENT_CRD and the import of it in coral2_dws.py.

if LOGGER.isEnabledFor(logging.INFO):
try:
api_response = k8s_api.list_namespaced_custom_object(
*DATAMOVEMENT_CRD,
# TODO: There's a CRD() for this that includes namespace, but we want all namespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking me A) whether I'll be mad at you if you define DATAMOVEMENT_CRD without namespace or B) would it actually work with the kubernetes library? I'm not sure about B), I'm going to test it out but see also https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#list_cluster_custom_object , if you know kubernetes HTTP requests (I don't) I think the answer should be there. Have you tested this?

@@ -181,7 +181,7 @@ def move_workflow_to_teardown(handle, winfo, k8s_api, workflow=None):
else:
for crd in api_response["items"]:
LOGGER.info(
"Found nnfdatamovement crd for workflow '%s': %s", winfo.name, crd
"Found nnfdatamovement crd for workflow '%s': %s", winfo.name, json.dumps(crd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a commit message body briefly explaining the motivation for the change? Just to satisfy our style requirements :) (see https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_1.html )

if LOGGER.isEnabledFor(logging.INFO):
try:
api_response = k8s_api.list_namespaced_custom_object(
*DATAMOVEMENT_CRD,
# TODO: There's a CRD() for this that includes namespace, but we want all namespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not 100% sure I understand the TODO comment

group="dataworkflowservices.github.io",
version="v1alpha2",
group="nnf.cray.hpe.com"
version="v1alpha1",
namespace="default",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we end up keeping it, there's a comma missing after the group=... line.

Problem: No NnfDataMovement resources are found when dumping to the
journal. The current implementation is only looking at the default
namespace instead of all namespaces. Additionally, the output is valid
json.

- Changed the api call to use `list_cluster_custom_object` to retrieve
  objects from all namespaces
- Deleted `DATAMOVEMENT_CRD` since namespace cannot be used for
  `list_cluster_custom_object`
- Moved NnfDatamovement retrieval to before the transition to
  `TearDown`. Otherwise these will start to be removed.
@bdevcich bdevcich changed the title Fix NnfDataMovement Logging dws: Fix NnfDataMovement logging Jul 23, 2024
@bdevcich
Copy link
Collaborator Author

@jameshcorbett Let's try this again. I cleaned up my commits to follow the style requirements (and updated the PR title/description).

@bdevcich
Copy link
Collaborator Author

Also, I did run black on my changes, but it did not flatten the from flux_k8s.crd import WORKFLOW_CRD line (like it did for you) for some reason.

Copy link
Member

@jameshcorbett jameshcorbett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! OK for me to merge? Or you can set the merge-when-passing label if you have that ability, I'm not sure

@mergify mergify bot merged commit 5a9b686 into flux-framework:master Jul 24, 2024
8 checks passed
@bdevcich bdevcich deleted the nnf-dm branch July 24, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants