Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28551 Check the transactional table is recreated by its Id #5482

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

czxm
Copy link

@czxm czxm commented Oct 3, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

Copy link

sonarcloud bot commented Oct 3, 2024

@difin
Copy link
Contributor

difin commented Oct 4, 2024

Please fill in the PR template to give some information about these changes.

@@ -15789,12 +15789,22 @@ private ValidTxnWriteIdList getQueryValidTxnWriteIdList() throws SemanticExcepti
return null;
}

private Set<Long> getTransactionedTables() throws SemanticException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: should be Transactional instead of Transactioned, and also the method name is confusing because it returns table IDs, not tables. I think getTransactionalTableIDs would be a more clear name.

if(!queryInfo.getInputs()
.stream()
.map(ReadEntity::getTable)
.map(Table::getTTable)
Copy link
Member

@deniskuzZ deniskuzZ Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that would work with the MV

  Table tbl = entity.getTable();
  if (tbl.isMaterializedView() && tbl.getMVMetadata() != null) {
    return tbl.getMVMetadata().getSourceTables().stream().map(SourceTable::getTable);
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid code duplication, the same thing in done in SA

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed solution looks good. I added a few comments that may improve performance and also cover a few more problematic cases.

@@ -88,11 +89,13 @@ public final class QueryResultsCache {
public static class LookupInfo {
private String queryText;
private Supplier<ValidTxnWriteIdList> txnWriteIdListProvider;
private Set<Long> txnTables;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to keep the current table ids as part of the lookup info. I would say that we need the ids from all tables not only from the transactional ones. The problem that we observed here seems to affect also non-transactional tables since recreating a table gives a new id.

Consider renaming the field to tableIds to better indicate its content. Also the field can be made final.

Comment on lines +678 to +684
if(!queryInfo.getInputs()
.stream()
.map(ReadEntity::getTable)
.map(Table::getTTable)
.map(org.apache.hadoop.hive.metastore.api.Table::getId)
.collect(Collectors.toSet()).containsAll(lookupInfo.txnTables))
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a holistic check here incurs some overhead since we duplicate some work that is already done as part of the for loop just below. Since we are already iterating over the read entities it may be better to simplify and move the check inside the loop.

lookupInfo.tableIds.contains(tableUsed.getTTable().getId())

Moreover, if we detect that a certain cache entry contains a table ID that is not part of the lookupInfo we should determine if we should/can invalidate and remove that entry from the cache. In the current, approach we simply bail-out and leave the entry inside the cache.

private Set<Long> getTransactionedTables() throws SemanticException {
return tablesFromReadEntities(inputs)
.stream()
.filter(AcidUtils::isTransactionalTable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote previously, I have the impression that it makes sense to gather the ids from all kinds of tables (not only transactional ones).

private QueryResultsCache.LookupInfo createLookupInfoForQuery(ASTNode astNode) throws SemanticException {
QueryResultsCache.LookupInfo lookupInfo = null;
String queryString = getQueryStringForCache(astNode);
if (queryString != null) {
ValidTxnWriteIdList writeIdList = getQueryValidTxnWriteIdList();
lookupInfo = new QueryResultsCache.LookupInfo(queryString, () -> writeIdList);
Set<Long> txnTables = getTransactionedTables();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the tables ids is almost a one-liner so you could possibly just inline the code.

Set<Long> tableIds =
          tablesFromReadEntities(inputs).stream().map(Table::getTTable).map(t -> t.getId()).collect(Collectors.toSet());

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

set hive.query.results.cache.enabled=true;
set hive.query.results.cache.nontransactional.tables.enabled=false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add also another test case (e.g., results_cache_invalidation4.q) when hive.query.results.cache.nontransactional.tables.enabled=true using non-transactional tables. It seems that if the table is dropped with the new logic we should be able to detect if the cache entry is valid or not.


CREATE TABLE author (fname STRING) STORED AS ORC TBLPROPERTIES('transactional'='true');
INSERT INTO author VALUES ('Alexander');
SELECT fname FROM author;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants