Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$match in pipeline uses indexes, but first subscription is really slow. #19

Closed
Hernanm0g opened this issue Aug 31, 2019 · 14 comments · Fixed by #20
Closed

$match in pipeline uses indexes, but first subscription is really slow. #19

Hernanm0g opened this issue Aug 31, 2019 · 14 comments · Fixed by #20

Comments

@Hernanm0g
Copy link

Hernanm0g commented Aug 31, 2019

Hi Rob.
Thans a lot for this amazing tool.
Im using it in a project and im wondering why a simple ReactiveAggregate lasts 9 seconds.

My "Evaluations" Collection has ~ 5000 docs. I want to publish some of them in a ClientCollection called EvaluationsList
I have setted an index for the Evaluation collection that works pretty well on normal Pub queries.

"key" : {
                        "company" : 1,
                        "period" : 1,
                        "type" : 1,
                        "active" : 1,
                        "worker" : 1
                },

And The following Publication lasts less than 1 sec

return Evaluations.find({
        company:"0716f03301bc2c4f779a8229",
        period: "2018-Q4",
        type:"evaluation",
        active:true
      }, 
      {skip:0, limit:5}
)

But when i use Reactive Aggregation it lasts more than 9 secs:

ReactiveAggregate(this, Evaluations,
        [
          {
            $match: {
              company:"0716f03301bc2c4f779a8229",
              period: "2018-Q4",
              type:"evaluation",
              active:true
            }
          },
          {
            $skip:0
          },
          {
            $limit:5
          }
        ],{
          clientCollection: "evaluationsList",
        }
      );

I think that maybe ReactiveAggregate doesnt use Mongodb Indexes. ¿Can you confirm this?

Thanks and regards.

@robfallows
Copy link
Owner

Reactive Aggregate uses the standard MongoDB aggregation method. It will use any indexes you have defined if it can. How have you defined your indexes?

@Hernanm0g
Copy link
Author

Hernanm0g commented Sep 2, 2019

Rob:
This is what i did on mongodb shell

db.evaluations.createIndex(
    company:1,
    period:1,
    type:1
    active:1,
    worker:1
)

And this is how it looks

>> db.evaluations.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "meteor.evaluations"
        },
        {
                "v" : 2,
                "key" : {
                        "company" : 1,
                        "period" : 1,

                        "type" : 1,
                        "active" : 1,
                        "worker" : 1
                },
                "name" : "Company > Period > Type > Active > Worker",
                "ns" : "meteor.evaluations"
        },
        {
                "v" : 2,
                "key" : {
                        "company" : 1,
                        "active" : 1
                },
                "name" : "Company > Active",
                "ns" : "meteor.evaluations"
        }
]

@Hernanm0g
Copy link
Author

@robfallows is there any way to explain() ReactiveAggregate?

@robfallows
Copy link
Owner

Does this happen in development as well as production?

@robfallows robfallows mentioned this issue Sep 3, 2019
@robfallows
Copy link
Owner

@Hernanm0g : please see #20 and test to see if this fixes the problem. Thanks you.

@Hernanm0g
Copy link
Author

Hi Rob. Havent tested the new "enable_debug" yet, but adding:

aggregationOptions: {
            allowDiskUse:true
          },

helped a lot.

I will test the new features and give you a feedback
Thanks

@Hernanm0g Hernanm0g changed the title $match in pipeline uses indexes? $match in pipeline uses indexes, but first subscription is really slow. Sep 4, 2019
@Hernanm0g
Copy link
Author

Hernanm0g commented Sep 4, 2019

Hi @robfallows , thanks for the debugging option. I finally confirmed that my subscription is using the indexes i created, so thats not the problem.

debug: (res)=>{
           console.log(res.stages[0].$cursor.queryPlanner.winningPlan);
         }
       }

And outputs:

I20190904-13:20:53.133(-5)? { stage: 'FETCH',
I20190904-13:20:53.133(-5)?   filter: { '$or': [ [Object], [Object] ] },
I20190904-13:20:53.134(-5)?   inputStage:
I20190904-13:20:53.134(-5)?    { stage: 'IXSCAN',
I20190904-13:20:53.134(-5)?      keyPattern: { company: 1, period: 1, type: 1, worker: 1 },
I20190904-13:20:53.135(-5)?      indexName: 'company_1_period_1_type_1_worker_1',
I20190904-13:20:53.135(-5)?      isMultiKey: false,
I20190904-13:20:53.135(-5)?      multiKeyPaths: { company: [], period: [], type: [], worker: [] },
I20190904-13:20:53.135(-5)?      isUnique: false,
I20190904-13:20:53.135(-5)?      isSparse: false,
I20190904-13:20:53.135(-5)?      isPartial: false,
I20190904-13:20:53.136(-5)?      indexVersion: 2,
I20190904-13:20:53.137(-5)?      direction: 'forward',
I20190904-13:20:53.137(-5)?      indexBounds:
I20190904-13:20:53.137(-5)?       { company: [Array],
I20190904-13:20:53.138(-5)?         period: [Array],
I20190904-13:20:53.138(-5)?         type: [Array],
I20190904-13:20:53.138(-5)?         worker: [Array] } } }

which is great. But i found something weird in the timestamps, please check this out:

The first time the publication is made it lasts a lot (30 seconds), but when its done, the second time it lasts less than 1 second (which is what we expect in the first time too). Check the timestamp:

First pub: (30 secs)

I20190904-13:38:41.501(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-13:39:14.046(-5)? Reactive-Aggregate: publish: ready

Second pub: (1 sec)

I20190904-13:39:56.877(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-13:39:58.039(-5)? Reactive-Aggregate: publish: ready

Third piub and subsequent (1 sec)

I20190904-13:40:34.867(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-13:40:35.948(-5)? Reactive-Aggregate: publish: ready

Whenever i refresh or restart meteor, the first pub lasts a lot, and the subsequent are very fast. Im testing this on development, maybe in production it will fix as you suggested before.

Any thoughts?
Thanks again

@Hernanm0g
Copy link
Author

Hernanm0g commented Sep 4, 2019

@robfallows: confirmed, in production, the first publication lasts 30 secs as well.

@Hernanm0g
Copy link
Author

@robfallows, check this out:

Meteor.methods({
   "evaluations.aux": async function(){
    if (Meteor.isClient) {
      return []
    }
    let evals = await Evaluations.rawCollection().aggregate(
      [
        {
          $limit:1
        }
      ]
    ).toArray()
    return evals
  }
})
// Execution time : 0.027 secs

But ReactiveAggregate:

Meteor.publish("evaluations.aux1", function (){
    try {
      ReactiveAggregate(this, Evaluations, [
          {
            $limit:1
          }
        ],
        {
          debug:true
        }
      )
    } catch (exception) {
      throw new Meteor.Error('500', exception);
    }
  })
// Execution Time : ~ 30 secs
I20190904-17:08:52.796(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-17:09:22.831(-5)? Reactive-Aggregate: publish: ready

@Hernanm0g
Copy link
Author

Hernanm0g commented Sep 4, 2019

@robfallows: I've finally found what was delaying the publication, and was the observers intialization. this might be helpful for any user thats having the same issue.

I've added a few time trackers (in seconds) to aggregations.js to find out what was causing the delay and got this:

I20190904-18:23:36.864(-5)? tic initialized
I20190904-18:23:36.865(-5)? Reactive-Aggregate: starting :0
I20190904-18:23:36.865(-5)? Reactive-Aggregate: Before Observers :0
I20190904-18:23:36.866(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-18:24:05.188(-5)? Reactive-Aggregate: All Observers initialized:28.323
I20190904-18:24:05.198(-5)? Reactive-Aggregate: Triggered Update :28.323
I20190904-18:24:05.199(-5)? Reactive-Aggregate: Sending MondoDB aggregate Promise :28.324
I20190904-18:24:05.199(-5)? Reactive-Aggregate: Promise received :28.328
I20190904-18:24:05.200(-5)? Reactive-Aggregate: Docs added or changed :28.329
I20190904-18:24:05.200(-5)? Reactive-Aggregate: publish: ready

As you can see, the first time the observers are initialized, they take a long time to be setted. (28.323 seconds). So i decided to set noAutomaticObservers option to true, just for testing, and got this:

I20190904-18:27:55.665(-5)? tic initialized
I20190904-18:27:55.666(-5)? Reactive-Aggregate: starting :0
I20190904-18:27:55.666(-5)? Reactive-Aggregate: Before Observers :0
I20190904-18:27:55.667(-5)? Reactive-Aggregate: All Observers initialized:0
I20190904-18:27:55.667(-5)? Reactive-Aggregate: Triggered Update :0
I20190904-18:27:55.667(-5)? Reactive-Aggregate: Sending MondoDB aggregate Promise :0
I20190904-18:27:55.668(-5)? Reactive-Aggregate: Promise received :0.003
I20190904-18:27:55.670(-5)? Reactive-Aggregate: Docs added or changed :0.004
I20190904-18:27:55.670(-5)? Reactive-Aggregate: publish: ready

So its really, REALLY important that, if a collection is medium or large size (> 1000 docs), to set the "noAutomaticObservers" option to true, and to set in the "observers" object the filtered query that reduces the observer size. So now my publish looks like this

ReactiveAggregate(this, Evaluations, [
          {
            $match: {
              company:"5EoRsdTW4KidjvpdG",
              period: "2018-Q4",
              type:"evaluation",
              active:true
            }
          },
          {
            $skip:0
          },
          {
            $limit:5
          }
        ],
        {
          clientCollection: "evaluationsList",
          noAutomaticObserver:true,
          observers: [
            Evaluations.find({
              company:"5EoRsdTW4KidjvpdG",
              period: "2018-Q4",
              type:"evaluation",
              active:true
            },
            {
              skip:0, limit:5
            })
          ],
          debug:true
        }

And the debug results are:

I20190904-18:43:35.258(-5)? tic initialized
I20190904-18:43:35.259(-5)? Reactive-Aggregate: starting :0
I20190904-18:43:35.259(-5)? Reactive-Aggregate: Before Observers :0
I20190904-18:43:35.260(-5)? Reactive-Aggregate: collection: evaluations: initialise observer
I20190904-18:43:35.302(-5)? Reactive-Aggregate: All Observers initialized:0.039
I20190904-18:43:35.303(-5)? Reactive-Aggregate: Triggered Update :0.039
I20190904-18:43:35.310(-5)? Reactive-Aggregate: Sending MondoDB aggregate Promise :0.04
I20190904-18:43:35.319(-5)? Reactive-Aggregate: Promise received :0.061
I20190904-18:43:35.329(-5)? Reactive-Aggregate: Docs added or changed :0.072
I20190904-18:43:35.330(-5)? Reactive-Aggregate: publish: ready

Thanks a lot for the help. This achievement would be impossible without your fast response.
See you.

@robfallows
Copy link
Owner

@Hernanm0g : Awesome research and really interesting results.

I'm going to do a little more work on this to discover why the initial observer setup is so slow. I really don't think it should be.

Glad you have a solution you're happy with 🙂

@jasongrishkoff
Copy link

Hey, was playing around with implementing this on a large collection (1M+ documents) and encountered the same issue -- my app would crash because of heap allocation issues. So, I followed @Hernanm0g's tip and it solved my problem as well. Doesn't seem ideal, though. Did you did you ever make any discoveries @robfallows?

@jasongrishkoff
Copy link

jasongrishkoff commented Aug 9, 2020

Following up, one problem that emerges when using noAutomaticObserver:true and manually setting an observer is that the cursor doesn't update if a field not specified in the observers is changed.

For example:

var query = { user_id: this.userId }
var pipeline = [
  {$match:query},
  {$project:{user_id:1,updated:1}},
]
var options = {
  noAutomaticObserver:true,
  observers: [
    TestCollection.find(query)
  ],
}
ReactiveAggregate(this, TestCollection,pipeline,options)

If the updated field changes, the cursor doesn't update because updated is not specified in the observers.

For complex documents that has an array field such as history which gets updated with each user action, this lack of reactivity on the client side can cause issues.

Not sure the best solution there? Setting noAutomaticObserver:true is the only way to get past using this package on big collections, but... it reduces a big chunk of the reactivity that the normal Meteor.publish function provides.

Looks like #36 has discussion about this as well. Perhaps I need to create a new updated field on every document that has a timestamp reflecting the last time any field of that document was changed.

@robfallows
Copy link
Owner

robfallows commented Aug 10, 2020

For large collections it's crucial to ensure that good indexes are used, or there will be long delays and possibly other performance-related issues, especially where reactive changes are causing re-runs before the previous aggregation has completed. The MongoDB engine will then be running multiple aggregations at the same time, meaning likely excessive memory (heap and stack) problems.

I would suggest starting with the MongoDB shell:

  1. Measure the performance of the pipeline query:
    • How long does it take?
    • How does that compare with the expected re-running of a reactive aggregation due to changing data?
    • Could overlapping re-runs occur?
  2. Examine the pipeline execution plan:
    • Make sure you have high-quality indexes, especially for stages like $match, $sort and $lookup.

The aggregation pipeline is expensive and Meteor observers are expensive, so tuning the options in a reactive aggregation can really help:

  1. Automatic observers (the legacy behaviour), where all changes to the specified collection are tracked is deprecated. It can be fully emulated, if needed, by setting an appropriate observers value (collection.find()). However, this is often unnecessary. Fields which do not take part in the aggregation do not need to be tracked...

    If the updated field changes, the cursor doesn't update because updated is not specified in the observers.

    That is the intended pattern.

  2. noAutomaticObserver: true allows much finer control of what is allowed to cause an aggregation to re-run. For large or busy collections it's important to get the observers value(s) tuned appropriately.

  3. Consider using debounceCount and debounceDelay to hold off re-running potentially overlapping aggregations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants