-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Star Tree][Search][RFC] Parse aggregation request to resolve via star tree data structure #14871
Comments
Hi @sandeshkr419, We need an extensive design for the following features : 1. Auto detection of queries that can be solved via star tree at request / index / shard level
We need more details on how we can solve this for one query / aggregation and how this can be extended to other queries / aggregations easily. [ Extensibility of the approach to newer queries / aggregations ] And secondly validations + decisions on whether star tree index can be solved for the given query and if so which star tree index to be used.
2. Segment level decisionsSimilar to 'IndexOrDocValuesQuery', we might need 'OriginalOrStarTreeQuery' as there are certain scenarios where original query must be used. In this case, we can't take the decision right at request parsing stage as proposed in this doc as the aggregators to be used depends on the query used ( as star tree documents and segment documents differ , and also aggregators differ ) 3. Response correctness and scenarios
We currently return nested response to users when users query with nested aggregations and we have to pass via multiple aggregators.
We need a plan on how we are solving the same in star tree query flow using star tree index structures. 4. Star tree query algorithmThis can cover the actual algorithm that is used to traverse the star tree to solve queries and the limitations of the algorithms etc. If possible, scorer and weight structure of the query which is used along with the algorithm. Some of the things to cover :
During indexing , dimensions are converted to long regardless of the original index field's data type ( similar to current fields as well in SortedNDV ).
Please raise issues for the above and more such broader issues that needs to be solved and lets close on the approaches for the same. |
In my poc approach, I was trying to parse the request at coordinator/request level itself to appropriate star tree query/aggregator pair. Since cluster state is available on all nodes, and keep in mind introduction of more than 1 star tree for an index, it makes sense to do the parsing at shard level directly to avoid unnecessary transport traffic. As @bharath-techie suggested I think we will have to introduce a new composite OriginalOrStarTreeQuery[StarTreeQuery(s), OriginalQuery] and OriginalOrStarTreeAggregator[StarTreeAggregator(s), OriginalAggregator], in that way we can take decision at a segment level [1/ deleted docs, 2/ doc_count field, 3/ any other cost estimation] to whether execute star tree query/aggregator or not. Need to figure out next the entry point at shard level for query/aggregation rewrite in this case. |
Revisiting my POC changes, I started with resolving a single level (no sub) metric aggregation with/without a numeric terms query. Example query:
I ditched using a separate aggregator class setup altogether and made use of existing metric aggregators so that I can utilize them entirely. However, I still kept the usage of This made the flow of code much simpler. Here is a draft PR with the approach (raised against a private fork because of depending changes #14809 are still in review): sandeshkr419#227 |
Merged #15289 Other query shapes have separate issues opened for further discussion. |
With support of star-tree composite of indices, we would to resolve certain aggregation & search paths via star-tree itself. Thinking of 2 possibilities:
I'm in support if approach 2/ as in this way we do not introduce a new overhead for search users to reframe their queries to star-tree. Also, as a feature in development, the full search capabilities of using star tree will be developed incrementally.
For 2/, we would want to keep the search request & search response intact. One such request/response to start building up the framework for star tree request execution will be an aggregation request with groupby/nested aggregation.
In a default search path execution, the query and aggregation path are independently executed. In the star tree code path, the query and aggregation will be tightly coupled and this requires decision making on setting up correct star-tree query & aggregation pair during request parsing itself.
Been thinking something similar to a poc I did here, to create a query/aggregation pair with request parsing itself.
Sample Search Aggregation Request:
Sample Response Expected:
(this is non-star tree response):
The text was updated successfully, but these errors were encountered: