Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glidein metrics are calculated incorrectly #456

Open
rynge opened this issue Nov 12, 2024 · 0 comments
Open

Glidein metrics are calculated incorrectly #456

rynge opened this issue Nov 12, 2024 · 0 comments
Labels
BUG For BUGS factory-mon for affected component frontend-mon for affected component Medium Medium priority osg OSG stakeholder

Comments

@rynge
Copy link
Contributor

rynge commented Nov 12, 2024

Describe the bug

The summary metric XML and information lines are showing the wrong numbers, probably due to not handling pslots correctly. Example:

  <result>
    <status>OK</status>
    <metric name="AutoShutdown" ts="2024-11-12T12:43:55-06:00" uri="local">True</metric>
    <metric name="CondorDuration" ts="2024-11-12T12:43:55-06:00" uri="local">50568</metric>
    <metric name="TotalJobsNr" ts="2024-11-12T12:43:55-06:00" uri="local">16</metric>
    <metric name="TotalJobsTime" ts="2024-11-12T12:43:55-06:00" uri="local">33915</metric>
    <metric name="goodZJobsNr" ts="2024-11-12T12:43:55-06:00" uri="local">15</metric>
    <metric name="goodZJobsTime" ts="2024-11-12T12:43:55-06:00" uri="local">33807</metric>
    <metric name="goodNZJobsNr" ts="2024-11-12T12:43:55-06:00" uri="local">1</metric>
    <metric name="goodNZJobsTime" ts="2024-11-12T12:43:55-06:00" uri="local">107</metric>
    <metric name="badSignalJobsNr" ts="2024-11-12T12:43:55-06:00" uri="local">0</metric>
    <metric name="badSignalJobsTime" ts="2024-11-12T12:43:55-06:00" uri="local">0</metric>
    <metric name="badOtherJobsNr" ts="2024-11-12T12:43:55-06:00" uri="local">0</metric>
    <metric name="badOtherJobsTime" ts="2024-11-12T12:43:55-06:00" uri="local">0</metric>
    <metric name="CondorKilled" ts="2024-11-12T12:43:55-06:00" uri="local">False</metric>
  </result>

and

Total jobs 16 utilization 33915
Total goodZ jobs 15  (99.6743%) utilization 33807 (99.6822%)
Total goodNZ jobs 1  (0.325733%) utilization 107 (0.317847%)
Total badSignal jobs 0  (0%) utilization 0 (0%)
Total badOther jobs 0  (0%) utilization 0 (0%)

The total number of jobs for this glidein was not 16, but

$ grep 'Terminated job' /ospool/uc-shared/project/OSG-Staff/factory-logs/gfactory-1.osg-htc.org/entry_OSG_US_CHTC-Spark-CE1_pre/job.2619915.0.out | wc -l
307

We suspect this due to incorrect handling of pslot logs.

Info (please complete the following information):
Stakeholders and components can be a comma-separated list or on multiple lines.
If you add a new stakeholder or component, not on the sample list, add it on a line on its own.

  • HTCondor version: 23.x
  • Priority: medium
  • Stakeholders: OSG
  • Components: factory monitoring, frontend monitoring
@github-actions github-actions bot added BUG For BUGS factory-mon for affected component frontend-mon for affected component Medium Medium priority osg OSG stakeholder labels Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG For BUGS factory-mon for affected component frontend-mon for affected component Medium Medium priority osg OSG stakeholder
Projects
None yet
Development

No branches or pull requests

1 participant