Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Catch ImagePullBackOff from receptor workunitand surface to job_explanation #15689

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions awx/main/tasks/receptor.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,12 @@
self.task.update_model(self.task.instance.pk, status='pending')
return

if 'ImagePullBackOff' in detail:
logger.warning(detail)
log_name = self.task.instance.log_format
logger.warning(f"Could not launch pod for {log_name}. ImagePullBackOff.")
self.task.runner_callback.delay_update(job_explanation=f'{detail}')

Check warning on line 544 in awx/main/tasks/receptor.py

View check run for this annotation

Codecov / codecov/patch

awx/main/tasks/receptor.py#L541-L544

Added lines #L541 - L544 were not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this hit the latter line L559, self.task.runner_callback.delay_update(result_traceback=f'Receptor detail:\n{detail}')? Because it had output that you didn't find helpful?

Instead of adding another condition, I would rather suggest the change

diff --git a/awx/main/tasks/receptor.py b/awx/main/tasks/receptor.py
index 576ad661d5..aba140434e 100644
--- a/awx/main/tasks/receptor.py
+++ b/awx/main/tasks/receptor.py
@@ -549,9 +549,9 @@ class AWXReceptorJob:
                         receptor_output = b"".join(lines).decode()
                     if receptor_output:
                         self.task.runner_callback.delay_update(result_traceback=f'Worker output:\n{receptor_output}')
-                    elif detail:
+                    if detail:
                         self.task.runner_callback.delay_update(result_traceback=f'Receptor detail:\n{detail}')
-                    else:
+                    if not (detail or receptor_output):
                         logger.warning(f'No result details or output from {self.task.instance.log_format}, status:\n{state_name}')
                 except Exception:
                     logger.exception(f'Work results error from job id={self.task.instance.id} work_unit={self.task.instance.work_unit_id}')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to get it in job_explaination instead of result_traceback. I'm sick of that JSON parse error. This way we start widdle down the occurrence of the JSON parse error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key change is setting job_explaination

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your position is that the receptor detail should go in job_explanation instead of result_traceback, then you should modify the existing line (using f'Receptor detail:\n{detail}'), instead adding a boutique condition for the narrow case you are looking at.


try:
receptor_output = ''
if state_name == 'Failed' and self.task.runner_callback.event_ct == 0:
Expand Down
Loading