Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Validate lineage for all JexlNode rebuilding visitors #925

Closed
wants to merge 10 commits into from
Closed

WIP Validate lineage for all JexlNode rebuilding visitors #925

wants to merge 10 commits into from

Conversation

lbschanno
Copy link
Collaborator

WIP for #880 for verifying/fixing JexlNode rebuilding visitors to ensure that they preserve proper parentage.

I have fixed/verified parentage for the following visitors:

  • AllTermsIndexedVisitor
  • BooleanOptimizationVisitor
  • FacetCheck
  • DateIndexCleanupVisitor
  • ExpandMultinormalizedTerms
  • TreeFlatteningRebuildingVisitor
  • FixNegativeNumbersVisitor
  • GeowavePruningVisitor
  • QueryModelVisitor
  • RangeCoalescingVisitor
  • RegexFunctionVisitor

I am currently working on fixing parentage for the following visitors:

  • UniqueExpressionTermsVisitor
  • ExpandCompositeTerms

There are a number of rebuilding visitors that do not have any unit tests associated with them that I could find, nor was there enough documentation for me to be able to extrapolate test cases for them. I would appreciate any help with examples or desired test cases so that I can write tests for each of the following and verify parentage.

  • BoundedRangeDetectionVisitor
  • FunctionIndexQueryExpansionVisitor
  • FunctionNormalizationRebuildingVisitor
  • IsNotNullIntentVisitor
  • ParallelIndexExpansion
  • PruneLessSelectiveFieldsVisitor
  • PushdownLargeFieldedListsVisitor
  • PushdownMissingIndexRangeNodesVisitor
  • PushFunctionsIntoExceededValueRanges
  • RangeConjunctionRebuildingVisitor
  • RangeExpansionThresholdRebuildingVisitor

Multiple RebuildingVisitor implementations do not return a rebuilt query
that is considered valid according to JexlASTHelper.validateLineage().

Ensure that existing RebuildingVisitors are tested, and when necessary,
modified to return a JEXL tree with a valid lineage.

Fixes #880

Fix AllTermsIndexedVisitor

Verify DateIndexCleanupVisitor lineage

Verify ExpandMultinormalizedTerms lineage

Verify FixNegativeNumbersVisitor lineage

Remove unnecessary query print

Remove System.out.print

Verify FixUnindexedNumericTerms lineage

Verify QueryModelVisitor lineage

Fix RangeCoalescingVisitor

Verify RegexFunctionVisitor lineage
@lbschanno lbschanno changed the title WIP: Validate lineage for all JexlNode rebuilding visitors WIP Validate lineage for all JexlNode rebuilding visitors Sep 26, 2020
@lbschanno lbschanno marked this pull request as draft September 26, 2020 03:26

for (int i = 0; i < toAttach.jjtGetNumChildren(); i++) {
JexlNode node = copy(prunedNode);
JexlNode attach = (JexlNode) toAttach.jjtGetChild(i).jjtAccept(this, data);
attach.jjtSetParent(node);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see lots of refactoring here but this is the only line that actually fixes a parenting problem for this class. Refactoring for the sake of refactoring may do more harm than good if anybody else is working on these visitors for any other reason. It also makes it very difficult for anybody to actually accept the entire package and merge. I think you may need to pull back a little from the refactoring efforts. Better yet a separate pull request per visitor that is only refactoring would be considerably more palatable.

import datawave.query.exceptions.DatawaveFatalQueryException;
import datawave.query.jexl.JexlASTHelper;
import datawave.query.jexl.JexlNodeFactory;
import datawave.query.jexl.JexlNodeFactory.ContainerType;
import datawave.query.model.QueryModel;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am stumped in this visitor as to where any lineage problems were fixed. Is this one purely refactoring?

andNode.jjtAddChild(newChild, i);
newChild.jjtSetParent(andNode);
}
}


Copy link
Collaborator

@ivakegg ivakegg Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this visitor looks good.

if (functionMetadata.name().equals("excludeRegex")) {
newParent = new ASTAndNode(ParserTreeConstants.JJTANDNODE);

switch (functionMetadata.name()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much cleaner, thanks

@ivakegg
Copy link
Collaborator

ivakegg commented Oct 9, 2020

First I must explain what a (query property) "marker" node is. A portion of a query (jexl subtree) can be marked as have some attribute by attaching an assignment node to that subtree. So if I want to mark some node A with a marker N, then I replace A with ((N = true) && (A)). These markers can be thought of as query execution hints that the planner puts in place to allow the execution to do certain things.

Second I must explain what "push down" means. To "push down" some portion of a query (jexl subtree) is to use the field index instead of the global index, or pushing down farther to the query evaluation phase. Query execution basically started with global index lookup, then field index lookup (if required), then query evaluation meaning bouncing the boolean logic against all of the values for a document.

Here is some description of these visitors:

  • BoundedRangeDetectionVisitor
    This is supposed to detect whether there are bounded ranges, however it appears to only be looking for bounds. I am actually changing this visitor here: Change to mark bounded ranges in JEXL #926
  • FunctionIndexQueryExpansionVisitor
    This will expand all of the functions within the query to include something that we can run against the index. Basically every function node in the tree will be replaced with (function && (<some query)). The index query created will be dependent on the query descriptor for the query in question.
  • FunctionNormalizationRebuildingVisitor
    This visitor will normalize the values within the arguments of a function node. The reason this is separated out is that the function descriptor needs to be used to determine which arguments are actually values and for which field do they apply. The field name is needed to lookup the data type and subsequently the normalization required. Normalization of values is required for index lookup.
  • IsNotNullIntentVisitor
    if an ER (regular expression) node is of the form FIELD =~ '*', then replace with: FIELD != null
  • ParallelIndexExpansion
    This is used for expanding regex nodes into the set of discrete values. It is also used for unfielded term expansion (FixUnfieldedTermsVisitor) by extension. If a regex (or range) cannot be expanded into descrete values, then it is wrapped with an ExceededValueThresholdMarkerJexlNode marker.
  • PruneLessSelectiveFieldsVisitor
    Not used for now. You can ignore this one for now until we start populating the metadata with more extensive metrics at which time this visitor may be changed extensively.
  • PushdownLargeFieldedListsVisitor
    This will take a very large list of values OR'ed together for one field into an ExceededOrThresholdMarkerJexlNode construct. This essentially will allow an "Ivarator" (DatawaveFieldIndedListIteratorJexl) to be used instead of creating a separate source for each term (see QueryIterator).
  • PushdownMissingIndexRangeNodesVisitor
    This will push down terms as "evaluation only" (meaning the index is not used) when there is an entry in the metadata table that denotes that the field is not indexed for some time range. This is an initial form of a solution for Handle the cases were a field is both indexed and not indexed within a time range  #825.
  • PushFunctionsIntoExceededValueRanges
    If we have an ExceededValueThresholdMarkerJexlNode on a range which had been produced as a result of the FunctionIndexQueryExpansionVisitor above, then the function is pushed into the marker node construct to allow the "Ivarator" to evaluation the function when scanning the field index.
  • RangeConjunctionRebuildingVisitor
    This will expand ranges into discrete values as the ParallelIndexExpansion visitor does for regex nodes.
  • RangeExpansionThresholdRebuildingVisitor
    This one is not used and can be ignored/dropped.

@lbschanno
Copy link
Collaborator Author

@ivakegg I have removed any refactoring not related to fixes. I will create separate PRs for that work as we discussed so that this PR is focused on fixes only.

@lbschanno
Copy link
Collaborator Author

Closing this PR to break it up into individual PRs for each visitor.

@lbschanno lbschanno closed this Oct 20, 2020
@lbschanno lbschanno deleted the validate-lineage branch November 5, 2020 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants