snapshot: delegate reading to arrow/parquet readers #590

asubiotto · 2023-11-16T14:13:59Z

We were previously copying all snapshot part bytes into a slice that was then passed to arrow readers (parquet copies are left untouched, see #590 (comment)). This was an unnecessary indirection and resulted in extra allocations on startup. Benchmark results on recovery from snapshot only:

goos: darwin
goarch: arm64
pkg: github.com/polarsignals/frostdb
          │  benchmain   │              benchnew              │
          │    sec/op    │   sec/op     vs base               │
Replay-12   1680.6m ± 5%   816.0m ± 3%  -51.45% (p=0.002 n=6)

          │   benchmain   │              benchnew               │
          │     B/op      │     B/op      vs base               │
Replay-12   2382.9Mi ± 0%   633.0Mi ± 0%  -73.43% (p=0.002 n=6)

          │  benchmain   │              benchnew              │
          │  allocs/op   │  allocs/op   vs base               │
Replay-12   20.807M ± 0%   5.623M ± 0%  -72.98% (p=0.002 n=6)

asubiotto · 2023-11-16T14:19:26Z

This doesn't seem to quite work because we close the file right after loading the snapshot and parquet reads lazily. Maybe we should keep the file open? The issue is that we might have a bunch of hidden memory that we don't know about until we go to read these parquet row groups.

asubiotto · 2023-11-23T09:39:46Z

~~Modified to only avoid the copy for arrow parts and it seems like the numbers are even better~~ (there was a bug):

goos: darwin
goarch: arm64
pkg: github.com/polarsignals/frostdb
          │  benchmain   │              benchnew              │
          │    sec/op    │   sec/op     vs base               │
Replay-12   1680.6m ± 5%   816.0m ± 3%  -51.45% (p=0.002 n=6)

          │   benchmain   │              benchnew               │
          │     B/op      │     B/op      vs base               │
Replay-12   2382.9Mi ± 0%   633.0Mi ± 0%  -73.43% (p=0.002 n=6)

          │  benchmain   │              benchnew              │
          │  allocs/op   │  allocs/op   vs base               │
Replay-12   20.807M ± 0%   5.623M ± 0%  -72.98% (p=0.002 n=6)

We were previously copying all snapshot part bytes into a slice that was then passed to arrow readers. This was an unnecessary indirection and resulted in extra allocations on startup. Note that parquet bytes are still fully copied since we cannot read those lazily (underlying file is closed after reading from snapshot). Benchmark results on recovery from snapshot only: ``` goos: darwin goarch: arm64 pkg: github.com/polarsignals/frostdb │ benchmain │ benchnew │ │ sec/op │ sec/op vs base │ Replay-12 1.681 ± 5% 2.545 ± 4% +51.41% (p=0.002 n=6) │ benchmain │ benchnew │ │ B/op │ B/op vs base │ Replay-12 2.327Gi ± 0% 2.278Gi ± 0% -2.11% (p=0.002 n=6) │ benchmain │ benchnew │ │ allocs/op │ allocs/op vs base │ Replay-12 20.81M ± 0% 20.79M ± 0% -0.10% (p=0.002 n=6) ```

asubiotto · 2023-11-23T09:59:30Z

Closing since after fixing the bug the improvement is not really noticeable/worth it:

goos: darwin
goarch: arm64
pkg: github.com/polarsignals/frostdb
          │ benchmain  │             benchnew              │
          │   sec/op   │   sec/op    vs base               │
Replay-12   1.681 ± 5%   2.545 ± 4%  +51.41% (p=0.002 n=6)

          │  benchmain   │              benchnew              │
          │     B/op     │     B/op      vs base              │
Replay-12   2.327Gi ± 0%   2.278Gi ± 0%  -2.11% (p=0.002 n=6)

          │  benchmain  │             benchnew              │
          │  allocs/op  │  allocs/op   vs base              │
Replay-12   20.81M ± 0%   20.79M ± 0%  -0.10% (p=0.002 n=6)

brancz · 2023-11-23T15:40:18Z

Did you forget to close?

asubiotto requested review from thorfour and brancz November 16, 2023 14:14

asubiotto force-pushed the alfonso-alloc branch from ef32416 to 885a545 Compare November 23, 2023 09:39

asubiotto force-pushed the alfonso-alloc branch from 885a545 to 7ab6a04 Compare November 23, 2023 09:46

asubiotto force-pushed the alfonso-alloc branch from 7ab6a04 to d720639 Compare November 23, 2023 09:58

asubiotto closed this Nov 23, 2023

asubiotto deleted the alfonso-alloc branch May 17, 2024 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snapshot: delegate reading to arrow/parquet readers #590

snapshot: delegate reading to arrow/parquet readers #590

asubiotto commented Nov 16, 2023 •

edited

Loading

asubiotto commented Nov 16, 2023 •

edited

Loading

asubiotto commented Nov 23, 2023 •

edited

Loading

asubiotto commented Nov 23, 2023

brancz commented Nov 23, 2023

snapshot: delegate reading to arrow/parquet readers #590

snapshot: delegate reading to arrow/parquet readers #590

Conversation

asubiotto commented Nov 16, 2023 • edited Loading

asubiotto commented Nov 16, 2023 • edited Loading

asubiotto commented Nov 23, 2023 • edited Loading

asubiotto commented Nov 23, 2023

brancz commented Nov 23, 2023

asubiotto commented Nov 16, 2023 •

edited

Loading

asubiotto commented Nov 16, 2023 •

edited

Loading

asubiotto commented Nov 23, 2023 •

edited

Loading