-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long execution time for JobCreator #11335
Comments
Running the query from 'python3' using cProfile I got the following:
|
Thanks @germanfgv this seems to include highly valuable information also for the more generic issue we are currently investigating: #11330 |
Germán, based on your profile I would say that it is issue with ORACLE since the
As such, the actual error comes from ORACLE execution. I suggest to dump the actual query and profile it with ORACLE DBA. |
Hi @vkuznet,
-- looking at the log timestamps shows the actual time it took for this query is 38 min
-- took less than a second. I doubt this has anything to do with Oracle here. |
@todor-ivanov , I doubt that python cProfile lies, it shows exactly where the issue is. Said that, it does not mean that surrounding code, threads, conditions, etc. does not influence the ORACLE call. For instance, server may be stuck with lots of unclosed pool connections which caused latency of ORACLE call, or server was busy that context (thread) switching took lots of time which influence ORACLE latency call, etc. Of course, I have no doubt that pristine environment shows no sign of issues, but the same call within framework may. If you'll look closely to profile output you'll notice |
As we figured out with @germanfgv two days ago, as the agent started to get drained more and more .... the WMBS queries got back to some normal states:
What we also noticed from the logs was that the huge delays for the component were also mixed with some pretty short/quick polling cycles taking just the expected several seconds rather than hours. I'd say we put that issue in waiting until the problem reveals itself again. |
Thanks @todor-ivanov. We will keep logs from the previous agent and keep an eye on the new one. |
I am closing the issue now, since there is no problem to investigate at the moment. We will reopen it in case the problem shows up again. FYI @amaltaro |
Impact of the bug
T0 Agent and likely any WMAgent
Describe the bug
We noticed the JobCreator on our T0 production agent was taking increasingly long times to complete a loop, up to 4h20m handeling around 2k subscriptions. During that time we would receive more input files, increasing the amount of work to do in the next loop, Increasing the loop duration for as long as we keep taking data
After checking the logs and debugging, I noticed that most of this time was spent in the execution of the GetLocationsForJobs query. Each execution takes arouind 4.7seconds, which is a lot for a relatively simple query. The same query from sqlplus or sqlDeveloper takes around 30ms.
As the CMS duty cycle increases, this problem can render our agent unusable
How to reproduce it
Simply run an agent with with jobs to create
Expected behavior
The GetLocationsForJobs query should take just a fraction of a second.
Additional context and error message
Here you can see the delay from the polling of the jobSplitter to the creation of the jobGroup
The text was updated successfully, but these errors were encountered: