High CPU usage on MongoDB due to slow request when using a huge Queue #4

Zer0-Tolerance · 2019-06-10T16:08:12Z

Hi David,

Great module very useful to use a very large queue, but it seems to have an issue with collection of more than a million documents.
Here is the result of the slow queries ie > 3 secs
db.currentOp({"secs_running": {$gte: 3}})

/* 1 */
{
"inprog" : [
{
"host" : "recon.local:27017",
"desc" : "conn1096",
"connectionId" : 1096,
"client" : "127.0.0.1:51081",
"clientMetadata" : {
"driver" : {
"name" : "MongoDB Perl Driver",
"version" : "2.0.3"
},
"os" : {
"type" : "darwin"
},
"platform" : "Perl v5.24.4 darwin-thread-multi-2level"
},
"active" : true,
"currentOpTime" : "2019-06-10T17:43:56.835+0200",
"opid" : 18276052,
"secs_running" : NumberLong(5),
"microsecs_running" : NumberLong(5256346),
"op" : "command",
"ns" : "NScan.hostq",
"command" : {
"aggregate" : "hostq",
"pipeline" : [
{
"$match" : {
"_r" : {
"$exists" : false
}
}
},
{
"$group" : {
"n" : {
"$sum" : 1
},
"_id" : null
}
}
],
"cursor" : {},
"$readPreference" : {
"mode" : "secondaryPreferred"
},
"$db" : "NScan"
},
"planSummary" : "COLLSCAN",
"numYields" : 37202,
"locks" : {},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(74408)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(37204)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(37204)
}
}
}
},
...
],
"ok" : 1.0
}

After some testing it's the group operation in the aggregate that is very slow
db.getCollection('hostq').aggregate([ {$match: {"_r" : {"$exists" : false}}}, // {"$group" : {"n" : {"$sum" : 1},"_id" : null}} ]) -> 0.037 secs db.getCollection('hostq').aggregate([ {$match: {"_r" : {"$exists" : false}}}, {"$group" : {"n" : {"$sum" : 1},"_id" : null}} ]) -> 13 secs

Any chance you can fix this and update the module ?

The text was updated successfully, but these errors were encountered:

xdg · 2019-06-10T18:10:08Z

Hi, that should be fixable with an appropriate index. If you add an index {"_r":1, "_p":1}, does that get rid of the slow query? (Also, can you confirm whether you have an existing index on "_p"?)

Zer0-Tolerance · 2019-06-10T21:43:01Z

already checked that and added an Index on _r but still very slow ...
indexSizes" : { "_id_" : 226652160, "_p_1" : 46321664, "_r_1" : 21610496 },

Any other idea ? I'm using Mongo 3.6

xdg · 2019-06-10T22:50:30Z

I was suggesting a joint index on _r and _p, with _r first, following the rule of "equality, sort, range" for indexes. That should do better than an individual index as it should avoid the collection scan (COLLSCAN).

Zer0-Tolerance · 2019-06-11T06:07:58Z

just tried the compound index but same result still 3 secs with 2.6 millions docs.

xdg · 2019-06-11T10:42:10Z

Ok, I'll have to run some experiments locally, but I'm slammed at work and then on vacation next week (which is why I'm slammed this week). So please be patient, but I will look into it. This should be possible without a collection scan.

xdg · 2019-06-11T13:20:08Z

Looking again at your report, nothing in MongoDBx::Queue is issuing an aggregate or a $group pipeline stage -- so I think this must be some other query you're running outside MongoDBx::Queue.

I also found SERVER-29444 for improving performance of $group based on indexes. You might find some insight there and in linked tickets. For example, it's possible that indexing on the group field and sorting it first could help, but that's getting outside my area of expertise.

Zer0-Tolerance · 2019-06-11T21:28:52Z

I've just rechecked but I'm not doing any query on the collection used by MongoDBx::Queue so it must be coming from MongoDBx maybe ?

xdg · 2019-06-11T21:58:23Z

Oh, I see now. That's the new underlying query generate from count_documents. Great! Now I know what to investigate.

xdg · 2019-06-11T21:59:17Z

I take it you're calling waiting -- that would be the origin of the slow query. In the meantime, if you can avoid that call, you'll see performance improve.

Zer0-Tolerance · 2019-06-12T06:32:31Z

true I've changed to size it's better but underlying queries are using COLLSCAN which is surprising given we get better results than waiting that is using IXSCAN (index).
Nevertheless I'm still getting this 500 % CPU usage but it's not starting the CPU FAN anymore ...

xdg · 2019-06-12T11:09:17Z

There's a design problem in that the "_r" field is created and removed and that make is hard to index -- even a sparse index doesn't always get picked up properly by the query planner. The fix would be to change the internals to create "_r" always with some value like 0 meaning "not reserved", and then indexing would work correctly. The problem is that's a breaking change, so it's a major version bump and I'd need to provide some way for people to upgrade their old data, too. That's what I'm currently thinking through.

Zer0-Tolerance · 2019-06-14T22:53:32Z

I fully understand your problem but it would be good to get better performance :)
Let me know if you need some beta testing.

Zer0-Tolerance · 2019-06-18T21:31:31Z

Any progress ?

xdg · 2019-06-19T16:22:14Z

A bit. Conceptually, I'm going to rewrite MongoDBx::Queue as a facade in front of two implementation classes. Default will be legacy, but users can opt into v2. Then migration is just a matter of reserving tasks from the old queue and inserting them into a new, v2, queue (separate collection). That should avoid surprising users on update and allow users control over migration -- whether to shut down all old work processing and migrate or whether to keep old running while migrating to a new queue.

I'm not getting a lot of time to hack on it around family obligations, but at least I have now have a plan and will keep chipping away. I'm working on migrating legacy to the facade model and when that passes testing, I'll work on v2. Worst case, I'm back from vacation next week and can bang something out then.

xdg · 2019-06-19T22:25:54Z

Here's a test tarball: https://www.dropbox.com/s/ypl3ia5v1ze92e2/MongoDBx-Queue-2.001-TRIAL.tar.gz?dl=1

I did some local testing (at 100k documents) and a query comparable to waiting uses an index rather than a collection scan.

You'll need to migrate/copy your jobs from the v1 queue to another collection/database name for the v2 queue (using version => 2 in the queue options) and test there. That's not well documented yet.

Zer0-Tolerance · 2019-06-21T21:52:16Z

great thks ! I'll do some test this week end and let you know.

Zer0-Tolerance · 2019-06-25T05:57:47Z

Hi David,
I still have some slow queries like this one with q very big queue:

inprog" : [ 
        {
            "host" : "recon.home:27017",
            "desc" : "conn14",
            "connectionId" : 14,
            "client" : "127.0.0.1:49381",
            "clientMetadata" : {
                "driver" : {
                    "name" : "MongoDB Perl Driver",
                    "version" : "2.0.3"
                },
                "os" : {
                    "type" : "darwin"
                },
                "platform" : "Perl v5.24.4 darwin-thread-multi-2level"
            },
            "active" : true,
            "currentOpTime" : "2019-06-25T07:55:50.672+0200",
            "opid" : 87194,
            "secs_running" : NumberLong(2),
            "microsecs_running" : NumberLong(2856433),
            "op" : "command",
            "ns" : "NScan.hostq",
            "command" : {
                "aggregate" : "hostq",
                "pipeline" : [ 
                    {
                        "$match" : {
                            "_x" : false
                        }
                    }, 
                    {
                        "$group" : {
                            "_id" : null,
                            "n" : {
                                "$sum" : 1
                            }
                        }
                    }
                ],
                "cursor" : {},
                "$readPreference" : {
                    "mode" : "secondaryPreferred"
                },
                "$db" : "NScan"
            },
            "planSummary" : "COUNT_SCAN { _x: 1, _p: 1 }",
            "numYields" : 25632,
            "locks" : {},
            "waitingForLock" : false,
            "lockStats" : {
                "Global" : {
                    "acquireCount" : {
                        "r" : NumberLong(51268)
                    }
                },
                "Database" : {
                    "acquireCount" : {
                        "r" : NumberLong(25634)
                    }
                },
                "Collection" : {
                    "acquireCount" : {
                        "r" : NumberLong(25634)
                    }
                }
            }
        }
    ],
    "ok" : 1.0

xdg · 2019-06-25T15:10:35Z

I've discussed this with the query team. The COUNT_SCAN plan is the best the database is going to do -- it's walking the index and incrementing a counter without touching any underlying data and you've just got a lot of things to count. I suspect the only way to make it faster is to scale up the hardware the database is running on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU usage on MongoDB due to slow request when using a huge Queue #4

High CPU usage on MongoDB due to slow request when using a huge Queue #4

Zer0-Tolerance commented Jun 10, 2019

xdg commented Jun 10, 2019 •

edited

Loading

Zer0-Tolerance commented Jun 10, 2019

xdg commented Jun 10, 2019

Zer0-Tolerance commented Jun 11, 2019

xdg commented Jun 11, 2019

xdg commented Jun 11, 2019

Zer0-Tolerance commented Jun 11, 2019

xdg commented Jun 11, 2019

xdg commented Jun 11, 2019

Zer0-Tolerance commented Jun 12, 2019

xdg commented Jun 12, 2019

Zer0-Tolerance commented Jun 14, 2019

Zer0-Tolerance commented Jun 18, 2019

xdg commented Jun 19, 2019

xdg commented Jun 19, 2019

Zer0-Tolerance commented Jun 21, 2019

Zer0-Tolerance commented Jun 25, 2019 •

edited by xdg

Loading

xdg commented Jun 25, 2019

High CPU usage on MongoDB due to slow request when using a huge Queue #4

High CPU usage on MongoDB due to slow request when using a huge Queue #4

Comments

Zer0-Tolerance commented Jun 10, 2019

xdg commented Jun 10, 2019 • edited Loading

Zer0-Tolerance commented Jun 10, 2019

xdg commented Jun 10, 2019

Zer0-Tolerance commented Jun 11, 2019

xdg commented Jun 11, 2019

xdg commented Jun 11, 2019

Zer0-Tolerance commented Jun 11, 2019

xdg commented Jun 11, 2019

xdg commented Jun 11, 2019

Zer0-Tolerance commented Jun 12, 2019

xdg commented Jun 12, 2019

Zer0-Tolerance commented Jun 14, 2019

Zer0-Tolerance commented Jun 18, 2019

xdg commented Jun 19, 2019

xdg commented Jun 19, 2019

Zer0-Tolerance commented Jun 21, 2019

Zer0-Tolerance commented Jun 25, 2019 • edited by xdg Loading

xdg commented Jun 25, 2019

xdg commented Jun 10, 2019 •

edited

Loading

Zer0-Tolerance commented Jun 25, 2019 •

edited by xdg

Loading