-
Notifications
You must be signed in to change notification settings - Fork 1
MongoDB Indexes
MongoDB has better guidance than I do on indexing strategies. The big takeaway is that your indexes need to support the queries you run. When I query Nessus scan data I'm generally looking for only a few things:
I'm forever wanting to find the most recent scan of a certain IP address or hostname. The most efficient way of doing this is to create a compound index on
-
ip
and thescanned_on
date and -
hostname
and thescanned_on
date.
The order is very important here. Specifying the indexes as 1. ip
and 2. scanned_on
means that a query on ip
only (very common) uses the compound index but a query on scanned_on
only (less common) doesn't. To use an index on stand alone scanned_on
queries you would need to create a new index. I like to create compound indexes that reference dates of the form db.scans.ensureIndex({fieldA: 1, scanned_on: -1})
. This gives me an index that supports fast retrieval of recent scans. I benchmarked some of this data.
db.scans.ensureIndex( { ip: 1, scanned_on: -1 }, {name: "ips_with_date"} );
I normally wouldn't name an index, but did here for clarity
db.scans.ensureIndex( { hostname: 1, scanned_on: -1 }, {name: "hostname_with_date"} );
db.scans.ensureIndex( { "os_name": 1 } );
The commands below create multikey indexes. These will be larger than single key indexes, but will drastically improve read queries that are of the ilk: "Show me all systems that exhibit event element X".
db.scans.ensureIndex( {"events.cve": 1} );
db.scans.ensureIndex( {"events.severity": 1} );
Covered queries are badass. Effectively, if you create a query that only queries items in the index, MongoDB only uses the index.
In our example above we had a compound index on hostname
and scanned_on
. If we create a query that only uses those fields, MongoDB doesn't even have to retrieve the document.
mongos> db.scans.find({ "hostname": {$ne: null} }, {_id:0, hostname:1, scanned_on: 1}).explain();
{
"cursor" : "BtreeCursor hostname_with_date multi",
"isMultiKey" : false,
"n" : 1092,
"nscannedObjects" : 1092,
"nscanned" : 1093,
"nscannedObjectsAllPlans" : 1092,
"nscannedAllPlans" : 1093,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 6,
"indexBounds" : {
"hostname" : [
[
{
"$minElement" : 1
},
null
],
[
null,
{
"$maxElement" : 1
}
]
],
"scanned_on" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "devmongodb001",
"millis" : 6
}
The important part is "indexOnly": true,. Separately, if we did
db.scans.find({ "hostname": {$ne: null} }, {_id:0, hostname:1, scanned_on: 1, ip: 1}).explain();
we would get "indexOnly": false, because we're displaying the "ip" field, which isn't part of this index.