-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] [Star Tree] [Search] Resolving Date histogram with metric aggregation using star-tree #16674
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sandesh Kumar <[email protected]>
❌ Gradle check result for d7fbc39: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@@ -136,6 +140,35 @@ public void collect(int doc, long bucket) throws IOException { | |||
public LeafBucketCollector getStarTreeCollector(LeafReaderContext ctx, LeafBucketCollector sub, CompositeIndexFieldInfo starTree) | |||
throws IOException { | |||
final CompensatedSum kahanSummation = new CompensatedSum(sums.get(0), 0); | |||
if (parent != null && subAggregators.length == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this can't be the default collector regardless of parent collectors ?
when there is only metric aggregation, there will be a single bucket otherwise multiple buckets.
|
||
@Override | ||
public void collect(int doc, long bucket) throws IOException { | ||
sub.collect(doc, bucket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we throw UnSupportedException
here - star tree collectors can strictly enforce this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
|
||
@Override | ||
public void setScorer(Scorable s) throws IOException { | ||
sub.setScorer(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even here we should be able to throw UnSupportedException
since there is no scorer for star tree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I am not using StarTreeLeafBucketCollectorBase
at all. I am thinking of getting rid of it if if does not serves much utility.
Earlier I thought I might need to delegate operations to original collector if required.
long bucketOrd = bucketOrds.add(0, dimensionValue); | ||
if (bucketOrd < 0) { | ||
bucketOrd = -1 - bucketOrd; | ||
collectStarTreeBucket((StarTreeBucketCollector) sub, metricValue, bucketOrd, bit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use collectExistingStarBucket
/ collectStarBucket
paradigm here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can refactor to it if required. Both the utilities differ by grow(bucketOrd + 1);
which I had used manually for now during POC but will abstract it in a better way.
@@ -164,7 +172,8 @@ private static StarTreeResult traverseStarTree(StarTreeValues starTreeValues, Ma | |||
|
|||
String childDimension = dimensionNames.get(dimensionId + 1); | |||
StarTreeNode starNode = null; | |||
if (globalRemainingPredicateColumns == null || !globalRemainingPredicateColumns.contains(childDimension)) { | |||
if (globalRemainingPredicateColumns == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if( (globalRemainingPredicateColumns == null
|| !globalRemainingPredicateColumns.contains(childDimension)) && !remainingGroupByColumns.contains(childDimension) )
Predicate conditions should be grouped together
CompositeDataCubeFieldType compositeIndexFieldInfo, | ||
AggregatorFactory aggregatorFactory | ||
) { | ||
if (aggregatorFactory instanceof DateHistogramAggregatorFactory && aggregatorFactory.getSubFactories().getFactories().length == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably have a recursive method which checks if parent aggregator is star tree supported and if children aggregators are supported and so on.
And we can make the supported aggregators configurable.
Description
This is the draft set of changes which I am utilizing to discuss solution approach here.
The primary challenge to resolve a date histogram with metric aggregation was to figure out how sub-aggregators will get resolve. When resolving a query by star-tree, we lose lthe need of ucene documents and don't utilize
collect()
calls which are used internally to delegate the collection of sub-aggregations to sub-collectors.To mitigate this challenge, I have introduced a wrapper class -
StarTreeBucketCollector
to basically introduce acollectStarEntry(int starTreeEntry, long bucket)
method. This method is then overridden in metric aggregator methods and invoked from parent aggregators (here DateHistogramAggregator).The benefit of this strategy is that this is easily extensible by other bucket aggregators where metric aggregations will be nested. Also, other bucket related utilities are re-useable as it is, it saves the effort of having a separate set of utilities for star tree buckets as the old buckets are utilized here.
Want to take early feedback on this approach.
Note: Things are hard-coded for one example query shape right now
Related Issues
Resolves (#16552)
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.