Skip to content
George Ehrhorn edited this page Jun 19, 2013 · 13 revisions

Description

MongoDB has better guidance than I do on indexing strategies. The big takeaway is that your indexes need to support the queries you run. When I query Nessus scan data I'm generally looking for only a few things:

A Note on Compound Indexes

I'm forever wanting to find the most recent scan of a certain IP address or hostname. The most efficient way of doing this is to create a compound index on

  1. ip and the scanned_on date and
  2. hostname and the scanned_on date.

The order is very important here. Specifying the indexes as 1. ip and 2. scanned_on means that a query on ip only (very common) uses the compound index but a query on scanned_on only (less common) doesn't. To use an index on stand alone scanned_on queries you would need to create a new index. I like to create compound indexes that reference dates of the form db.scans.ensureIndex({fieldA: 1, scanned_on: -1}). This gives me an index that supports fast retrieval of recent scans. I benchmarked some of this data.

Indexes

Index on IP address and date

db.scans.ensureIndex( { ip: 1, scanned_on: -1 }, {name: "ips_with_date"} );

I normally wouldn't name an index, but did here for clarity

Index on Hostname

db.scans.ensureIndex( { hostname: 1, scanned_on: -1 }, {name: "hostname_with_date"} );

Index on Operating System

db.scans.ensureIndex( { "os_name": 1 } );

Multikey Indexes

The commands below create multikey indexes. These will be larger than single key indexes, but will drastically improve read queries that are of the ilk: "Show me all systems that exhibit event element X".

Index on CVE

db.scans.ensureIndex( {"events.cve": 1} );

Index on Severity

db.scans.ensureIndex( {"events.severity": 1} );

Covered Queries

Covered queries are badass. Effectively, if you create a query that only queries items in the index, MongoDB only uses the index.

In our example above we had a compound index on hostname and scanned_on. If we create a query that only uses those fields, MongoDB doesn't even have to retrieve the document.

mongos> db.scans.find({ "hostname": {$ne: null} }, {_id:0, hostname:1, scanned_on: 1}).explain();
{
        "cursor" : "BtreeCursor hostname_with_date multi",
        "isMultiKey" : false,
        "n" : 1092,
        "nscannedObjects" : 1092,
        "nscanned" : 1093,
        "nscannedObjectsAllPlans" : 1092,
        "nscannedAllPlans" : 1093,
        "scanAndOrder" : false,
        "indexOnly" : true,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 6,
        "indexBounds" : {
                "hostname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                null
                        ],
                        [
                                null,
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "scanned_on" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ]
        },
        "server" : "devmongodb001",
        "millis" : 6
}

The important part is "indexOnly": true,. Separately, if we did

db.scans.find({ "hostname": {$ne: null} }, {_id:0, hostname:1, scanned_on: 1, ip: 1}).explain();

we would get "indexOnly": false, because we're displaying the "ip" field, which isn't part of this index.