[GLUTEN-7750][VL]: store unsafe batches data #7902

Zand100 · 2024-11-12T00:50:02Z

What changes were proposed in this pull request?

Adds a binary container implementing a Spark MemoryConsumer to be used instead of Array[Array[Bytes]].

(Fixes: #7750)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

github-actions · 2024-11-12T00:50:19Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2024-11-12T00:52:19Z

#7750

Zand100 · 2024-11-12T00:53:54Z

backends-velox/src/main/scala/org/apache/spark/sql/execution/UnsafeArray.scala

+    val recordLength = 2L * uaoSize + inputLength + 8L
+
+    UnsafeAlignedOffset.putSize(base, offset, inputLength + uaoSize)
+    offset += 2L * uaoSize


I'm not sure if I'm calculating the offset correctly

zhztheplayer · 2024-11-12T07:53:52Z

@Zand100 Is there any test case can be added for this change?

Zand100 · 2024-11-13T02:17:38Z

backends-velox/src/main/scala/org/apache/spark/sql/execution/ColumnarBuildSideRelation.scala

+          val columnVector = new OffHeapColumnVector(batch.numElements(), DataTypes.BinaryType)
+          columnVector.putByteArray(batchId, batch.toByteArray, batch.getBaseOffset.toInt, batch.numElements)
+          val columnarBatch = new ColumnarBatch(Array(columnVector))


@zhztheplayer Is it alright to create the ColumnarBatch this way, using OffHeapColumnVector and constructing a new ColumnarBatch directly from that?

Zand100 · 2024-11-13T02:29:32Z

I'm planning to write unit and integration tests. I'm still getting familiar with the code. I'm guessing for UnsafeArray, I'll write tests similar to https://github.com/apache/spark/blob/master/core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java. Could you please point me to some other places to update tests? (I couldn't find the tests forColumnarBuildSideRelation.)

store unsafe batches data

f69d06a

github-actions bot added the VELOX label Nov 12, 2024

Zand100 changed the title ~~store unsafe batches data~~ [GLUTEN-7750][VL]: store unsafe batches data Nov 12, 2024

Zand100 mentioned this pull request Nov 12, 2024

[VL] Move ColumnarBuildSideRelation's memory occupation to Spark off-heap #7750

Open

Zand100 commented Nov 12, 2024

View reviewed changes

use UnsafeArrayData in deserialize; fix MemoryConsumer error

8d0ff18

Zand100 commented Nov 13, 2024

View reviewed changes

Zand100 mentioned this pull request Nov 14, 2024

[GLUTEN-7750][VL]: store unsafe batches data #7944

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-7750][VL]: store unsafe batches data #7902

[GLUTEN-7750][VL]: store unsafe batches data #7902

Zand100 commented Nov 12, 2024 •

edited

Loading

github-actions bot commented Nov 12, 2024

github-actions bot commented Nov 12, 2024

Zand100 Nov 12, 2024

zhztheplayer commented Nov 12, 2024

Zand100 Nov 13, 2024

Zand100 commented Nov 13, 2024

[GLUTEN-7750][VL]: store unsafe batches data #7902

Are you sure you want to change the base?

[GLUTEN-7750][VL]: store unsafe batches data #7902

Conversation

Zand100 commented Nov 12, 2024 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Nov 12, 2024

github-actions bot commented Nov 12, 2024

Zand100 Nov 12, 2024

Choose a reason for hiding this comment

zhztheplayer commented Nov 12, 2024

Zand100 Nov 13, 2024

Choose a reason for hiding this comment

Zand100 commented Nov 13, 2024

Zand100 commented Nov 12, 2024 •

edited

Loading