-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28551 Check the transactional table is recreated by its Id #5482
base: master
Are you sure you want to change the base?
Conversation
Quality Gate passedIssues Measures |
Please fill in the PR template to give some information about these changes. |
@@ -15789,12 +15789,22 @@ private ValidTxnWriteIdList getQueryValidTxnWriteIdList() throws SemanticExcepti | |||
return null; | |||
} | |||
|
|||
private Set<Long> getTransactionedTables() throws SemanticException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: should be Transactional
instead of Transactioned
, and also the method name is confusing because it returns table IDs, not tables. I think getTransactionalTableIDs
would be a more clear name.
if(!queryInfo.getInputs() | ||
.stream() | ||
.map(ReadEntity::getTable) | ||
.map(Table::getTTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think that would work with the MV
Table tbl = entity.getTable();
if (tbl.isMaterializedView() && tbl.getMVMetadata() != null) {
return tbl.getMVMetadata().getSourceTables().stream().map(SourceTable::getTable);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid code duplication, the same thing in done in SA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed solution looks good. I added a few comments that may improve performance and also cover a few more problematic cases.
@@ -88,11 +89,13 @@ public final class QueryResultsCache { | |||
public static class LookupInfo { | |||
private String queryText; | |||
private Supplier<ValidTxnWriteIdList> txnWriteIdListProvider; | |||
private Set<Long> txnTables; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to keep the current table ids as part of the lookup info. I would say that we need the ids from all tables not only from the transactional ones. The problem that we observed here seems to affect also non-transactional tables since recreating a table gives a new id.
Consider renaming the field to tableIds
to better indicate its content. Also the field can be made final.
if(!queryInfo.getInputs() | ||
.stream() | ||
.map(ReadEntity::getTable) | ||
.map(Table::getTTable) | ||
.map(org.apache.hadoop.hive.metastore.api.Table::getId) | ||
.collect(Collectors.toSet()).containsAll(lookupInfo.txnTables)) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a holistic check here incurs some overhead since we duplicate some work that is already done as part of the for loop just below. Since we are already iterating over the read entities it may be better to simplify and move the check inside the loop.
lookupInfo.tableIds.contains(tableUsed.getTTable().getId())
Moreover, if we detect that a certain cache entry contains a table ID that is not part of the lookupInfo we should determine if we should/can invalidate and remove that entry from the cache. In the current, approach we simply bail-out and leave the entry inside the cache.
private Set<Long> getTransactionedTables() throws SemanticException { | ||
return tablesFromReadEntities(inputs) | ||
.stream() | ||
.filter(AcidUtils::isTransactionalTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote previously, I have the impression that it makes sense to gather the ids from all kinds of tables (not only transactional ones).
private QueryResultsCache.LookupInfo createLookupInfoForQuery(ASTNode astNode) throws SemanticException { | ||
QueryResultsCache.LookupInfo lookupInfo = null; | ||
String queryString = getQueryStringForCache(astNode); | ||
if (queryString != null) { | ||
ValidTxnWriteIdList writeIdList = getQueryValidTxnWriteIdList(); | ||
lookupInfo = new QueryResultsCache.LookupInfo(queryString, () -> writeIdList); | ||
Set<Long> txnTables = getTransactionedTables(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the tables ids is almost a one-liner so you could possibly just inline the code.
Set<Long> tableIds =
tablesFromReadEntities(inputs).stream().map(Table::getTTable).map(t -> t.getId()).collect(Collectors.toSet());
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; | ||
|
||
set hive.query.results.cache.enabled=true; | ||
set hive.query.results.cache.nontransactional.tables.enabled=false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add also another test case (e.g., results_cache_invalidation4.q
) when hive.query.results.cache.nontransactional.tables.enabled=true
using non-transactional tables. It seems that if the table is dropped with the new logic we should be able to detect if the cache entry is valid or not.
|
||
CREATE TABLE author (fname STRING) STORED AS ORC TBLPROPERTIES('transactional'='true'); | ||
INSERT INTO author VALUES ('Alexander'); | ||
SELECT fname FROM author; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider adding a new line
https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
Is the change a dependency upgrade?
How was this patch tested?