Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] [Star Tree] [Search] Resolving Date histogram with metric aggregation using star-tree #16674

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sandeshkr419
Copy link
Contributor

@sandeshkr419 sandeshkr419 commented Nov 18, 2024

Description

This is the draft set of changes which I am utilizing to discuss solution approach here.

The primary challenge to resolve a date histogram with metric aggregation was to figure out how sub-aggregators will get resolve. When resolving a query by star-tree, we lose lthe need of ucene documents and don't utilize collect() calls which are used internally to delegate the collection of sub-aggregations to sub-collectors.

To mitigate this challenge, I have introduced a wrapper class - StarTreeBucketCollector to basically introduce a collectStarEntry(int starTreeEntry, long bucket) method. This method is then overridden in metric aggregator methods and invoked from parent aggregators (here DateHistogramAggregator).

The benefit of this strategy is that this is easily extensible by other bucket aggregators where metric aggregations will be nested. Also, other bucket related utilities are re-useable as it is, it saves the effort of having a separate set of utilities for star tree buckets as the old buckets are utilized here.

Want to take early feedback on this approach.

Note: Things are hard-coded for one example query shape right now

{
    "size": 0,
    "aggs": {
        "by_hour": {
            "date_histogram": {
                "field": "@timestamp",
                "calendar_interval": "month"
            }, "aggs": {
                "sum_status": {
                    "sum": {
                        "field": "status"
                    }
                }
            }
        }
    }
}

{
    "took": 15785,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1000,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "by_hour": {
            "buckets": [
                {
                    "key_as_string": "2024-01-01T00:00:00.000Z",
                    "key": 1704067200000,
                    "doc_count": 208,
                    "sum_status": {
                        "value": 43527.0
                    }
                },
                {
                    "key_as_string": "2024-02-01T00:00:00.000Z",
                    "key": 1706745600000,
                    "doc_count": 783,
                    "sum_status": {
                        "value": 164741.0
                    }
                },
                {
                    "key_as_string": "2024-03-01T00:00:00.000Z",
                    "key": 1709251200000,
                    "doc_count": 9,
                    "sum_status": {
                        "value": 1894.0
                    }
                }
            ]
        }
    }
}

Related Issues

Resolves (#16552)

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sandesh Kumar <[email protected]>
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search:Aggregations labels Nov 18, 2024
Copy link
Contributor

❌ Gradle check result for d7fbc39: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -136,6 +140,35 @@ public void collect(int doc, long bucket) throws IOException {
public LeafBucketCollector getStarTreeCollector(LeafReaderContext ctx, LeafBucketCollector sub, CompositeIndexFieldInfo starTree)
throws IOException {
final CompensatedSum kahanSummation = new CompensatedSum(sums.get(0), 0);
if (parent != null && subAggregators.length == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this can't be the default collector regardless of parent collectors ?

when there is only metric aggregation, there will be a single bucket otherwise multiple buckets.


@Override
public void collect(int doc, long bucket) throws IOException {
sub.collect(doc, bucket);
Copy link
Contributor

@bharath-techie bharath-techie Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw UnSupportedException here - star tree collectors can strictly enforce this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


@Override
public void setScorer(Scorable s) throws IOException {
sub.setScorer(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even here we should be able to throw UnSupportedException since there is no scorer for star tree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I am not using StarTreeLeafBucketCollectorBase at all. I am thinking of getting rid of it if if does not serves much utility.
Earlier I thought I might need to delegate operations to original collector if required.

long bucketOrd = bucketOrds.add(0, dimensionValue);
if (bucketOrd < 0) {
bucketOrd = -1 - bucketOrd;
collectStarTreeBucket((StarTreeBucketCollector) sub, metricValue, bucketOrd, bit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use collectExistingStarBucket / collectStarBucket paradigm here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can refactor to it if required. Both the utilities differ by grow(bucketOrd + 1); which I had used manually for now during POC but will abstract it in a better way.

@@ -164,7 +172,8 @@ private static StarTreeResult traverseStarTree(StarTreeValues starTreeValues, Ma

String childDimension = dimensionNames.get(dimensionId + 1);
StarTreeNode starNode = null;
if (globalRemainingPredicateColumns == null || !globalRemainingPredicateColumns.contains(childDimension)) {
if (globalRemainingPredicateColumns == null
Copy link
Contributor

@bharath-techie bharath-techie Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if( (globalRemainingPredicateColumns == null
                || !globalRemainingPredicateColumns.contains(childDimension)) && !remainingGroupByColumns.contains(childDimension) )

Predicate conditions should be grouped together

CompositeDataCubeFieldType compositeIndexFieldInfo,
AggregatorFactory aggregatorFactory
) {
if (aggregatorFactory instanceof DateHistogramAggregatorFactory && aggregatorFactory.getSubFactories().getFactories().length == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably have a recursive method which checks if parent aggregator is star tree supported and if children aggregators are supported and so on.

And we can make the supported aggregators configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Aggregations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants