HIVE-28551 Check the transactional table is recreated by its Id #5482

czxm · 2024-10-03T12:26:48Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

sonarcloud · 2024-10-03T13:59:37Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

difin · 2024-10-04T16:28:21Z

Please fill in the PR template to give some information about these changes.

difin · 2024-10-04T16:33:21Z

ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

@@ -15789,12 +15789,22 @@ private ValidTxnWriteIdList getQueryValidTxnWriteIdList() throws SemanticExcepti
    return null;
  }

+  private Set<Long> getTransactionedTables() throws SemanticException {


Typo: should be Transactional instead of Transactioned, and also the method name is confusing because it returns table IDs, not tables. I think getTransactionalTableIDs would be a more clear name.

deniskuzZ · 2024-10-07T10:33:51Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

+    if(!queryInfo.getInputs()
+            .stream()
+            .map(ReadEntity::getTable)
+            .map(Table::getTTable)


i don't think that would work with the MV

Table tbl = entity.getTable(); if (tbl.isMaterializedView() && tbl.getMVMetadata() != null) { return tbl.getMVMetadata().getSourceTables().stream().map(SourceTable::getTable); }

avoid code duplication, the same thing in done in SA

zabetak

The proposed solution looks good. I added a few comments that may improve performance and also cover a few more problematic cases.

zabetak · 2024-10-07T12:03:36Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

@@ -88,11 +89,13 @@ public final class QueryResultsCache {
  public static class LookupInfo {
    private String queryText;
    private Supplier<ValidTxnWriteIdList> txnWriteIdListProvider;
+    private Set<Long> txnTables;


It makes sense to keep the current table ids as part of the lookup info. I would say that we need the ids from all tables not only from the transactional ones. The problem that we observed here seems to affect also non-transactional tables since recreating a table gives a new id.

Consider renaming the field to tableIds to better indicate its content. Also the field can be made final.

zabetak · 2024-10-07T13:27:01Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

+    if(!queryInfo.getInputs()
+            .stream()
+            .map(ReadEntity::getTable)
+            .map(Table::getTTable)
+            .map(org.apache.hadoop.hive.metastore.api.Table::getId)
+            .collect(Collectors.toSet()).containsAll(lookupInfo.txnTables))
+        return false;


Adding a holistic check here incurs some overhead since we duplicate some work that is already done as part of the for loop just below. Since we are already iterating over the read entities it may be better to simplify and move the check inside the loop.

lookupInfo.tableIds.contains(tableUsed.getTTable().getId())

Moreover, if we detect that a certain cache entry contains a table ID that is not part of the lookupInfo we should determine if we should/can invalidate and remove that entry from the cache. In the current, approach we simply bail-out and leave the entry inside the cache.

zabetak · 2024-10-07T13:36:07Z

ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

+  private Set<Long> getTransactionedTables() throws SemanticException {
+    return tablesFromReadEntities(inputs)
+            .stream()
+            .filter(AcidUtils::isTransactionalTable)


As I wrote previously, I have the impression that it makes sense to gather the ids from all kinds of tables (not only transactional ones).

zabetak · 2024-10-07T13:42:20Z

ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

  private QueryResultsCache.LookupInfo createLookupInfoForQuery(ASTNode astNode) throws SemanticException {
    QueryResultsCache.LookupInfo lookupInfo = null;
    String queryString = getQueryStringForCache(astNode);
    if (queryString != null) {
      ValidTxnWriteIdList writeIdList = getQueryValidTxnWriteIdList();
-      lookupInfo = new QueryResultsCache.LookupInfo(queryString, () -> writeIdList);
+      Set<Long> txnTables = getTransactionedTables();


Getting the tables ids is almost a one-liner so you could possibly just inline the code.

Set<Long> tableIds = tablesFromReadEntities(inputs).stream().map(Table::getTTable).map(t -> t.getId()).collect(Collectors.toSet());

zabetak · 2024-10-07T13:46:19Z

ql/src/test/queries/clientpositive/results_cache_invalidation3.q

+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+
+set hive.query.results.cache.enabled=true;
+set hive.query.results.cache.nontransactional.tables.enabled=false;


It would be nice to add also another test case (e.g., results_cache_invalidation4.q) when hive.query.results.cache.nontransactional.tables.enabled=true using non-transactional tables. It seems that if the table is dropped with the new logic we should be able to detect if the cache entry is valid or not.

zabetak · 2024-10-07T13:47:39Z

ql/src/test/queries/clientpositive/results_cache_invalidation3.q

+
+CREATE TABLE author (fname STRING) STORED AS ORC TBLPROPERTIES('transactional'='true');
+INSERT INTO author VALUES ('Alexander');
+SELECT fname FROM author;


nit: Consider adding a new line

https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline

HIVE-28551 Check the transactional table is recreated by its Id

a831829

asf-ci-hive added the tests pending label Oct 3, 2024

asf-ci-hive added tests passed and removed tests pending labels Oct 3, 2024

difin reviewed Oct 4, 2024

View reviewed changes

deniskuzZ reviewed Oct 7, 2024

View reviewed changes

zabetak reviewed Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-28551 Check the transactional table is recreated by its Id #5482

HIVE-28551 Check the transactional table is recreated by its Id #5482

czxm commented Oct 3, 2024

sonarcloud bot commented Oct 3, 2024

difin commented Oct 4, 2024

difin Oct 4, 2024

deniskuzZ Oct 7, 2024 •

edited

Loading

deniskuzZ Oct 7, 2024

zabetak left a comment

zabetak Oct 7, 2024

zabetak Oct 7, 2024

zabetak Oct 7, 2024

zabetak Oct 7, 2024

zabetak Oct 7, 2024

zabetak Oct 7, 2024

HIVE-28551 Check the transactional table is recreated by its Id #5482

Are you sure you want to change the base?

HIVE-28551 Check the transactional table is recreated by its Id #5482

Conversation

czxm commented Oct 3, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

sonarcloud bot commented Oct 3, 2024

Quality Gate passed

difin commented Oct 4, 2024

Choose a reason for hiding this comment

deniskuzZ Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zabetak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deniskuzZ Oct 7, 2024 •

edited

Loading