$match in pipeline uses indexes, but first subscription is really slow. #19

Hernanm0g · 2019-08-31T07:15:24Z

Hi Rob.
Thans a lot for this amazing tool.
Im using it in a project and im wondering why a simple ReactiveAggregate lasts 9 seconds.

My "Evaluations" Collection has ~ 5000 docs. I want to publish some of them in a ClientCollection called EvaluationsList
I have setted an index for the Evaluation collection that works pretty well on normal Pub queries.

"key" : {
                        "company" : 1,
                        "period" : 1,
                        "type" : 1,
                        "active" : 1,
                        "worker" : 1
                },

And The following Publication lasts less than 1 sec

return Evaluations.find({
        company:"0716f03301bc2c4f779a8229",
        period: "2018-Q4",
        type:"evaluation",
        active:true
      }, 
      {skip:0, limit:5}
)

But when i use Reactive Aggregation it lasts more than 9 secs:

ReactiveAggregate(this, Evaluations,
        [
          {
            $match: {
              company:"0716f03301bc2c4f779a8229",
              period: "2018-Q4",
              type:"evaluation",
              active:true
            }
          },
          {
            $skip:0
          },
          {
            $limit:5
          }
        ],{
          clientCollection: "evaluationsList",
        }
      );

I think that maybe ReactiveAggregate doesnt use Mongodb Indexes. ¿Can you confirm this?

Thanks and regards.

The text was updated successfully, but these errors were encountered:

robfallows · 2019-09-02T08:20:03Z

Reactive Aggregate uses the standard MongoDB aggregation method. It will use any indexes you have defined if it can. How have you defined your indexes?

Hernanm0g · 2019-09-02T16:59:42Z

Rob:
This is what i did on mongodb shell

db.evaluations.createIndex(
    company:1,
    period:1,
    type:1
    active:1,
    worker:1
)

And this is how it looks

>> db.evaluations.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "meteor.evaluations"
        },
        {
                "v" : 2,
                "key" : {
                        "company" : 1,
                        "period" : 1,

                        "type" : 1,
                        "active" : 1,
                        "worker" : 1
                },
                "name" : "Company > Period > Type > Active > Worker",
                "ns" : "meteor.evaluations"
        },
        {
                "v" : 2,
                "key" : {
                        "company" : 1,
                        "active" : 1
                },
                "name" : "Company > Active",
                "ns" : "meteor.evaluations"
        }
]

Hernanm0g · 2019-09-02T23:26:25Z

@robfallows is there any way to explain() ReactiveAggregate?

robfallows · 2019-09-03T09:01:11Z

Does this happen in development as well as production?

robfallows · 2019-09-03T15:37:13Z

@Hernanm0g : please see #20 and test to see if this fixes the problem. Thanks you.

Hernanm0g · 2019-09-04T02:35:27Z

Hi Rob. Havent tested the new "enable_debug" yet, but adding:

aggregationOptions: {
            allowDiskUse:true
          },

helped a lot.

I will test the new features and give you a feedback
Thanks

Hernanm0g · 2019-09-04T18:43:46Z

Hi @robfallows , thanks for the debugging option. I finally confirmed that my subscription is using the indexes i created, so thats not the problem.

debug: (res)=>{
           console.log(res.stages[0].$cursor.queryPlanner.winningPlan);
         }
       }

And outputs:

I20190904-13:20:53.133(-5)? { stage: 'FETCH',
I20190904-13:20:53.133(-5)?   filter: { '$or': [ [Object], [Object] ] },
I20190904-13:20:53.134(-5)?   inputStage:
I20190904-13:20:53.134(-5)?    { stage: 'IXSCAN',
I20190904-13:20:53.134(-5)?      keyPattern: { company: 1, period: 1, type: 1, worker: 1 },
I20190904-13:20:53.135(-5)?      indexName: 'company_1_period_1_type_1_worker_1',
I20190904-13:20:53.135(-5)?      isMultiKey: false,
I20190904-13:20:53.135(-5)?      multiKeyPaths: { company: [], period: [], type: [], worker: [] },
I20190904-13:20:53.135(-5)?      isUnique: false,
I20190904-13:20:53.135(-5)?      isSparse: false,
I20190904-13:20:53.135(-5)?      isPartial: false,
I20190904-13:20:53.136(-5)?      indexVersion: 2,
I20190904-13:20:53.137(-5)?      direction: 'forward',
I20190904-13:20:53.137(-5)?      indexBounds:
I20190904-13:20:53.137(-5)?       { company: [Array],
I20190904-13:20:53.138(-5)?         period: [Array],
I20190904-13:20:53.138(-5)?         type: [Array],
I20190904-13:20:53.138(-5)?         worker: [Array] } } }

which is great. But i found something weird in the timestamps, please check this out:

The first time the publication is made it lasts a lot (30 seconds), but when its done, the second time it lasts less than 1 second (which is what we expect in the first time too). Check the timestamp:

First pub: (30 secs)

I20190904-13:38:41.501(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-13:39:14.046(-5)? Reactive-Aggregate: publish: ready

Second pub: (1 sec)

I20190904-13:39:56.877(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-13:39:58.039(-5)? Reactive-Aggregate: publish: ready

Third piub and subsequent (1 sec)

I20190904-13:40:34.867(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-13:40:35.948(-5)? Reactive-Aggregate: publish: ready

Whenever i refresh or restart meteor, the first pub lasts a lot, and the subsequent are very fast. Im testing this on development, maybe in production it will fix as you suggested before.

Any thoughts?
Thanks again

Hernanm0g · 2019-09-04T20:13:07Z

@robfallows: confirmed, in production, the first publication lasts 30 secs as well.

Hernanm0g · 2019-09-04T22:12:17Z

@robfallows, check this out:

Meteor.methods({
   "evaluations.aux": async function(){
    if (Meteor.isClient) {
      return []
    }
    let evals = await Evaluations.rawCollection().aggregate(
      [
        {
          $limit:1
        }
      ]
    ).toArray()
    return evals
  }
})
// Execution time : 0.027 secs

But ReactiveAggregate:

Meteor.publish("evaluations.aux1", function (){
    try {
      ReactiveAggregate(this, Evaluations, [
          {
            $limit:1
          }
        ],
        {
          debug:true
        }
      )
    } catch (exception) {
      throw new Meteor.Error('500', exception);
    }
  })
// Execution Time : ~ 30 secs

I20190904-17:08:52.796(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-17:09:22.831(-5)? Reactive-Aggregate: publish: ready

Hernanm0g · 2019-09-04T23:45:17Z

@robfallows: I've finally found what was delaying the publication, and was the observers intialization. this might be helpful for any user thats having the same issue.

I've added a few time trackers (in seconds) to aggregations.js to find out what was causing the delay and got this:

I20190904-18:23:36.864(-5)? tic initialized
I20190904-18:23:36.865(-5)? Reactive-Aggregate: starting :0
I20190904-18:23:36.865(-5)? Reactive-Aggregate: Before Observers :0
I20190904-18:23:36.866(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-18:24:05.188(-5)? Reactive-Aggregate: All Observers initialized:28.323
I20190904-18:24:05.198(-5)? Reactive-Aggregate: Triggered Update :28.323
I20190904-18:24:05.199(-5)? Reactive-Aggregate: Sending MondoDB aggregate Promise :28.324
I20190904-18:24:05.199(-5)? Reactive-Aggregate: Promise received :28.328
I20190904-18:24:05.200(-5)? Reactive-Aggregate: Docs added or changed :28.329
I20190904-18:24:05.200(-5)? Reactive-Aggregate: publish: ready

As you can see, the first time the observers are initialized, they take a long time to be setted. (28.323 seconds). So i decided to set noAutomaticObservers option to true, just for testing, and got this:

I20190904-18:27:55.665(-5)? tic initialized
I20190904-18:27:55.666(-5)? Reactive-Aggregate: starting :0
I20190904-18:27:55.666(-5)? Reactive-Aggregate: Before Observers :0
I20190904-18:27:55.667(-5)? Reactive-Aggregate: All Observers initialized:0
I20190904-18:27:55.667(-5)? Reactive-Aggregate: Triggered Update :0
I20190904-18:27:55.667(-5)? Reactive-Aggregate: Sending MondoDB aggregate Promise :0
I20190904-18:27:55.668(-5)? Reactive-Aggregate: Promise received :0.003
I20190904-18:27:55.670(-5)? Reactive-Aggregate: Docs added or changed :0.004
I20190904-18:27:55.670(-5)? Reactive-Aggregate: publish: ready

So its really, REALLY important that, if a collection is medium or large size (> 1000 docs), to set the "noAutomaticObservers" option to true, and to set in the "observers" object the filtered query that reduces the observer size. So now my publish looks like this

ReactiveAggregate(this, Evaluations, [
          {
            $match: {
              company:"5EoRsdTW4KidjvpdG",
              period: "2018-Q4",
              type:"evaluation",
              active:true
            }
          },
          {
            $skip:0
          },
          {
            $limit:5
          }
        ],
        {
          clientCollection: "evaluationsList",
          noAutomaticObserver:true,
          observers: [
            Evaluations.find({
              company:"5EoRsdTW4KidjvpdG",
              period: "2018-Q4",
              type:"evaluation",
              active:true
            },
            {
              skip:0, limit:5
            })
          ],
          debug:true
        }

And the debug results are:

I20190904-18:43:35.258(-5)? tic initialized
I20190904-18:43:35.259(-5)? Reactive-Aggregate: starting :0
I20190904-18:43:35.259(-5)? Reactive-Aggregate: Before Observers :0
I20190904-18:43:35.260(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-18:43:35.302(-5)? Reactive-Aggregate: All Observers initialized:0.039
I20190904-18:43:35.303(-5)? Reactive-Aggregate: Triggered Update :0.039
I20190904-18:43:35.310(-5)? Reactive-Aggregate: Sending MondoDB aggregate Promise :0.04
I20190904-18:43:35.319(-5)? Reactive-Aggregate: Promise received :0.061
I20190904-18:43:35.329(-5)? Reactive-Aggregate: Docs added or changed :0.072
I20190904-18:43:35.330(-5)? Reactive-Aggregate: publish: ready

Thanks a lot for the help. This achievement would be impossible without your fast response.
See you.

robfallows · 2019-09-05T08:51:24Z

@Hernanm0g : Awesome research and really interesting results.

I'm going to do a little more work on this to discover why the initial observer setup is so slow. I really don't think it should be.

Glad you have a solution you're happy with 🙂

jasongrishkoff · 2020-08-08T19:31:06Z

Hey, was playing around with implementing this on a large collection (1M+ documents) and encountered the same issue -- my app would crash because of heap allocation issues. So, I followed @Hernanm0g's tip and it solved my problem as well. Doesn't seem ideal, though. Did you did you ever make any discoveries @robfallows?

jasongrishkoff · 2020-08-09T15:04:37Z

Following up, one problem that emerges when using noAutomaticObserver:true and manually setting an observer is that the cursor doesn't update if a field not specified in the observers is changed.

For example:

var query = { user_id: this.userId }
var pipeline = [
  {$match:query},
  {$project:{user_id:1,updated:1}},
]
var options = {
  noAutomaticObserver:true,
  observers: [
    TestCollection.find(query)
  ],
}
ReactiveAggregate(this, TestCollection,pipeline,options)

If the updated field changes, the cursor doesn't update because updated is not specified in the observers.

For complex documents that has an array field such as history which gets updated with each user action, this lack of reactivity on the client side can cause issues.

Not sure the best solution there? Setting noAutomaticObserver:true is the only way to get past using this package on big collections, but... it reduces a big chunk of the reactivity that the normal Meteor.publish function provides.

Looks like #36 has discussion about this as well. Perhaps I need to create a new updated field on every document that has a timestamp reflecting the last time any field of that document was changed.

robfallows · 2020-08-10T10:14:05Z

For large collections it's crucial to ensure that good indexes are used, or there will be long delays and possibly other performance-related issues, especially where reactive changes are causing re-runs before the previous aggregation has completed. The MongoDB engine will then be running multiple aggregations at the same time, meaning likely excessive memory (heap and stack) problems.

I would suggest starting with the MongoDB shell:

Measure the performance of the pipeline query:
- How long does it take?
- How does that compare with the expected re-running of a reactive aggregation due to changing data?
- Could overlapping re-runs occur?
Examine the pipeline execution plan:
- Make sure you have high-quality indexes, especially for stages like $match, $sort and $lookup.

The aggregation pipeline is expensive and Meteor observers are expensive, so tuning the options in a reactive aggregation can really help:

Automatic observers (the legacy behaviour), where all changes to the specified collection are tracked is deprecated. It can be fully emulated, if needed, by setting an appropriate observers value (collection.find()). However, this is often unnecessary. Fields which do not take part in the aggregation do not need to be tracked...

If the updated field changes, the cursor doesn't update because updated is not specified in the observers.

That is the intended pattern.
noAutomaticObserver: true allows much finer control of what is allowed to cause an aggregation to re-run. For large or busy collections it's important to get the observers value(s) tuned appropriately.
Consider using debounceCount and debounceDelay to hold off re-running potentially overlapping aggregations.

robfallows mentioned this issue Sep 3, 2019

Enable debug #20

Merged

Hernanm0g changed the title ~~$match in pipeline uses indexes?~~ $match in pipeline uses indexes, but first subscription is really slow. Sep 4, 2019

Hernanm0g closed this as completed Sep 4, 2019

Hernanm0g mentioned this issue Sep 4, 2019

subscription really slow #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

$match in pipeline uses indexes, but first subscription is really slow. #19

$match in pipeline uses indexes, but first subscription is really slow. #19

Hernanm0g commented Aug 31, 2019 •

edited

Loading

robfallows commented Sep 2, 2019

Hernanm0g commented Sep 2, 2019 •

edited

Loading

Hernanm0g commented Sep 2, 2019

robfallows commented Sep 3, 2019

robfallows commented Sep 3, 2019

Hernanm0g commented Sep 4, 2019

Hernanm0g commented Sep 4, 2019 •

edited

Loading

Hernanm0g commented Sep 4, 2019 •

edited

Loading

Hernanm0g commented Sep 4, 2019

Hernanm0g commented Sep 4, 2019 •

edited

Loading

robfallows commented Sep 5, 2019

jasongrishkoff commented Aug 8, 2020

jasongrishkoff commented Aug 9, 2020 •

edited

Loading

robfallows commented Aug 10, 2020 •

edited

Loading

$match in pipeline uses indexes, but first subscription is really slow. #19

$match in pipeline uses indexes, but first subscription is really slow. #19

Comments

Hernanm0g commented Aug 31, 2019 • edited Loading

robfallows commented Sep 2, 2019

Hernanm0g commented Sep 2, 2019 • edited Loading

Hernanm0g commented Sep 2, 2019

robfallows commented Sep 3, 2019

robfallows commented Sep 3, 2019

Hernanm0g commented Sep 4, 2019

Hernanm0g commented Sep 4, 2019 • edited Loading

Hernanm0g commented Sep 4, 2019 • edited Loading

Hernanm0g commented Sep 4, 2019

Hernanm0g commented Sep 4, 2019 • edited Loading

robfallows commented Sep 5, 2019

jasongrishkoff commented Aug 8, 2020

jasongrishkoff commented Aug 9, 2020 • edited Loading

robfallows commented Aug 10, 2020 • edited Loading

Hernanm0g commented Aug 31, 2019 •

edited

Loading

Hernanm0g commented Sep 2, 2019 •

edited

Loading

Hernanm0g commented Sep 4, 2019 •

edited

Loading

Hernanm0g commented Sep 4, 2019 •

edited

Loading

Hernanm0g commented Sep 4, 2019 •

edited

Loading

jasongrishkoff commented Aug 9, 2020 •

edited

Loading

robfallows commented Aug 10, 2020 •

edited

Loading