Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement StringColumn using StringViewArray #16610

Merged
merged 89 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
0c1473b
feat: implement StringColumn using StringViewArray
andylokandy Oct 15, 2024
2e2e5f6
fix
andylokandy Oct 15, 2024
af524c0
convert binaryview between arrow1 and arrow2
andylokandy Oct 22, 2024
01ffce9
Merge branch 'main' of https://github.com/datafuselabs/databend into …
andylokandy Oct 22, 2024
8028a37
fix
andylokandy Oct 22, 2024
feab44e
fix
andylokandy Oct 22, 2024
803ace4
fix
andylokandy Oct 22, 2024
60eb67c
fix
andylokandy Oct 23, 2024
e99be9e
Merge branch 'main' into dev1
andylokandy Oct 23, 2024
a0d159a
fix
andylokandy Oct 25, 2024
e15e1e5
Merge branch 'main' of https://github.com/datafuselabs/databend into …
andylokandy Oct 25, 2024
56cf9d8
Merge branch 'main' of https://github.com/datafuselabs/databend into …
andylokandy Oct 28, 2024
ac64bdc
fix some issue
andylokandy Oct 28, 2024
e6c5933
fix view slice bug
sundy-li Oct 29, 2024
0e85757
fix view slice bug
sundy-li Oct 29, 2024
81fba8a
Merge branch 'main' of https://github.com/datafuselabs/databend into …
andylokandy Oct 29, 2024
ba35eb8
fix
andylokandy Oct 29, 2024
9598aa0
support native read write
sundy-li Oct 29, 2024
8ccd6d5
fix
andylokandy Oct 29, 2024
6d63f7e
Merge branch 'dev1' of https://github.com/andylokandy/databend into dev1
andylokandy Oct 29, 2024
f533de5
fix
andylokandy Oct 29, 2024
eab81d4
fix tests
sundy-li Oct 30, 2024
88db184
add with_data_type
sundy-li Oct 30, 2024
8416f80
add with_data_type
sundy-li Oct 30, 2024
89c03d7
fix gen_random_uuid commit row
sundy-li Oct 30, 2024
f478c79
move record batch to block
sundy-li Oct 30, 2024
bb605b9
Merge branch 'main' into dev1
sundy-li Oct 30, 2024
d712fd4
remove unused dep
andylokandy Oct 30, 2024
b813d71
fix lint
andylokandy Oct 30, 2024
1d8b4da
fix commit row
sundy-li Oct 30, 2024
60ab196
fix commit row
sundy-li Oct 30, 2024
af79030
fix size
sundy-li Oct 30, 2024
9eda2e3
fix size
sundy-li Oct 30, 2024
b9c1773
Merge branch 'main' into dev1
sundy-li Oct 30, 2024
e116066
add NewBinaryColumnBuilder and NewStringColumnBulder
andylokandy Oct 30, 2024
714db05
fix incorrect serialize_size
sundy-li Nov 1, 2024
7a781da
fix incorrect serialize_size
sundy-li Nov 1, 2024
9276cea
lint
sundy-li Nov 1, 2024
39ec7d0
lint
sundy-li Nov 1, 2024
37f57bc
fix tests
sundy-li Nov 1, 2024
c1cdb6d
use binary state
sundy-li Nov 1, 2024
0c5ea41
Merge branch 'main' into dev1
sundy-li Nov 1, 2024
5e94781
use binary state
sundy-li Nov 1, 2024
6d8ecd3
update tests
sundy-li Nov 1, 2024
3a6396f
update tests
sundy-li Nov 1, 2024
aa194d6
update tests
sundy-li Nov 1, 2024
6ba6c7c
fix native view encoding
sundy-li Nov 2, 2024
43977ea
fix
andylokandy Nov 2, 2024
887df5e
[ci skip] updata kernel concat for view types
sundy-li Nov 2, 2024
b14f232
[ci skip]Merge branch 'main' into dev1
sundy-li Nov 2, 2024
0eabef5
[ci skip]Merge branch 'main' into dev1
sundy-li Nov 2, 2024
3456b4b
[ci skip] improve kernels for view types
sundy-li Nov 3, 2024
b9e22d8
[ci skip] only string type use string view type
sundy-li Nov 4, 2024
d8e5345
[ci skip] only string type use string view type
sundy-li Nov 4, 2024
89788d9
fix tests
sundy-li Nov 4, 2024
c2e1103
[ci skip] fix tests
sundy-li Nov 4, 2024
dcdf8b4
[ci skip] fix
sundy-li Nov 4, 2024
129e950
fix
andylokandy Nov 4, 2024
1f6c9ae
use NewStringColumnBuilder
andylokandy Nov 4, 2024
d381611
rename NewString -> String
sundy-li Nov 5, 2024
8757696
Merge branch 'main' into dev1
sundy-li Nov 5, 2024
4cb0277
fmt
sundy-li Nov 5, 2024
5f0dfe1
[ci skip] update tests
sundy-li Nov 5, 2024
37e549c
optimize take
sundy-li Nov 5, 2024
0411ea3
Merge branch 'main' into dev1
sundy-li Nov 5, 2024
24da802
add bench
sundy-li Nov 5, 2024
ad2366f
Merge branch 'dev1' of github.com:andylokandy/databend into dev1
sundy-li Nov 5, 2024
bcf29d2
fix tests
sundy-li Nov 5, 2024
4ec78de
[ci skip]Merge branch 'main' into dev1
sundy-li Nov 5, 2024
66b207d
update
sundy-li Nov 6, 2024
eda8e0f
improve compare
andylokandy Nov 6, 2024
00087ad
implement compare using string view prefix
andylokandy Nov 6, 2024
7de18c9
fix
andylokandy Nov 6, 2024
3303105
fix
sundy-li Nov 6, 2024
c2e3cb1
Merge branch 'main' into dev1
sundy-li Nov 6, 2024
e59e280
fix
sundy-li Nov 6, 2024
aad572a
fix-length
sundy-li Nov 6, 2024
6d040f8
disable spill
sundy-li Nov 6, 2024
c30b3c1
[ci skip] add put_and_commit
sundy-li Nov 6, 2024
5184580
[ci skip] update
sundy-li Nov 6, 2024
95e0b09
update test
sundy-li Nov 7, 2024
e8dc899
lint
sundy-li Nov 7, 2024
013a1c6
[ci skip] add maybe gc
sundy-li Nov 7, 2024
5ca5e83
fix endiness
andylokandy Nov 7, 2024
8b11de2
fix endiness
andylokandy Nov 7, 2024
d584953
fix
andylokandy Nov 7, 2024
3a73b39
update string compare
sundy-li Nov 7, 2024
7ae547c
Merge branch 'main' into dev1
sundy-li Nov 7, 2024
a9def92
update
sundy-li Nov 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 16 additions & 19 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -219,9 +219,9 @@ arrow-ipc = { version = "53" }
arrow-ord = { version = "53" }
arrow-schema = { version = "53", features = ["serde"] }
arrow-select = { version = "53" }
arrow-udf-js = "0.5.0"
arrow-udf-python = "0.4.0"
arrow-udf-wasm = "0.4.0"
arrow-udf-js = { git = "https://github.com/arrow-udf/arrow-udf", rev = "80b09d6" }
arrow-udf-python = { git = "https://github.com/arrow-udf/arrow-udf", rev = "80b09d6" }
arrow-udf-wasm = { git = "https://github.com/arrow-udf/arrow-udf", rev = "80b09d6" }
async-backtrace = "0.2"
async-channel = "1.7.1"
async-compression = { git = "https://github.com/datafuse-extras/async-compression", rev = "dc81082", features = [
Expand Down
1 change: 0 additions & 1 deletion src/common/arrow/src/arrow/array/binview/ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ unsafe impl<T: ViewType + ?Sized> ToFfi for BinaryViewArrayGeneric<T> {
validity,
views: self.views.clone(),
buffers: self.buffers.clone(),
raw_buffers: self.raw_buffers.clone(),
phantom: Default::default(),
total_bytes_len: AtomicU64::new(self.total_bytes_len.load(Ordering::Relaxed)),
total_buffer_len: self.total_buffer_len,
Expand Down
76 changes: 76 additions & 0 deletions src/common/arrow/src/arrow/array/binview/from.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,89 @@
// See the License for the specific language governing permissions and
// limitations under the License.

use arrow_data::ArrayData;
use arrow_data::ArrayDataBuilder;
use arrow_schema::DataType;

use crate::arrow::array::Arrow2Arrow;
use crate::arrow::array::BinaryViewArray;
use crate::arrow::array::BinaryViewArrayGeneric;
use crate::arrow::array::MutableBinaryViewArray;
use crate::arrow::array::Utf8ViewArray;
use crate::arrow::array::ViewType;
use crate::arrow::bitmap::Bitmap;

impl<T: ViewType + ?Sized, P: AsRef<T>> FromIterator<Option<P>> for BinaryViewArrayGeneric<T> {
#[inline]
fn from_iter<I: IntoIterator<Item = Option<P>>>(iter: I) -> Self {
MutableBinaryViewArray::<T>::from_iter(iter).into()
}
}

impl Arrow2Arrow for BinaryViewArray {
fn to_data(&self) -> ArrayData {
let builder = ArrayDataBuilder::new(DataType::BinaryView)
.len(self.len())
.add_buffer(self.views.clone().into())
.add_buffers(
self.buffers
.iter()
.map(|x| x.clone().into())
.collect::<Vec<_>>(),
)
.nulls(self.validity.clone().map(Into::into));
unsafe { builder.build_unchecked() }
}

fn from_data(data: &ArrayData) -> Self {
let views = crate::arrow::buffer::Buffer::from(data.buffers()[0].clone());
let buffers = data.buffers()[1..]
.iter()
.map(|x| crate::arrow::buffer::Buffer::from(x.clone()))
.collect();
let validity = data.nulls().map(|x| Bitmap::from_null_buffer(x.clone()));
unsafe {
Self::new_unchecked_unknown_md(
crate::arrow::datatypes::DataType::BinaryView,
views,
buffers,
validity,
None,
)
}
}
}

impl Arrow2Arrow for Utf8ViewArray {
fn to_data(&self) -> ArrayData {
let builder = ArrayDataBuilder::new(DataType::Utf8View)
.len(self.len())
.add_buffer(self.views.clone().into())
.add_buffers(
self.buffers
.iter()
.map(|x| x.clone().into())
.collect::<Vec<_>>(),
)
.nulls(self.validity.clone().map(Into::into));
unsafe { builder.build_unchecked() }
}

fn from_data(data: &ArrayData) -> Self {
let views = crate::arrow::buffer::Buffer::from(data.buffers()[0].clone());
let buffers = data.buffers()[1..]
.iter()
.map(|x| crate::arrow::buffer::Buffer::from(x.clone()))
.collect();
let validity = data.nulls().map(|x| Bitmap::from_null_buffer(x.clone()));
unsafe {
Self::new_unchecked_unknown_md(
crate::arrow::datatypes::DataType::Utf8View,
views,
buffers,
validity,
None,
)
}
}
}
Loading
Loading