Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Maximum allowed chunk size for gdeflate is 64KB #93

Open
technillogue opened this issue Jan 3, 2024 · 6 comments
Open

[BUG] Maximum allowed chunk size for gdeflate is 64KB #93

technillogue opened this issue Jan 3, 2024 · 6 comments

Comments

@technillogue
Copy link

technillogue commented Jan 3, 2024

The blog post states

The following examples use the high throughput GDeflate compression format. [...] Larger chunk sizes typically lead to higher compression ratios at the expense of less parallelism exposed to the GPU. A good starting chunk size is 64 KB, but feel free to experiment with these values to explore the associated tradeoffs for your datasets.

In my case I'm using e.g. 40 managers to process >11GB, so I already have enough parallelism and am trying to get better CR than 0.92

Similarly, the changelog for 2.4.1 states

The Deflate batched decompression API can work on uncomprressed data chunk larger than 64KB.

However, actually using 256KB (GdeflateManager nvcomp_manager{ 1 << 18, nvcompBatchedGdeflateDefaultOpts, stream, NoComputeNoVerify };) leads to this error:

[7] [2024-01-03 20:49] [critical] nvCOMP version 3.0.0; Linux x86-64; CUDA 11.8 build
[7] [2024-01-03 20:49] [error] In nvcompBatchedGdeflateCompressGetOutputSize(): Maximum allowed chunk size for gdeflate is 64KB
[7] [2024-01-03 20:49] [error] In nvcompBatchedGdeflateCompressGetTempSize(): Maximum allowed chunk size for gdeflate is 64KB
[7] [2024-01-03 20:49] [error] In nvcompBatchedGdeflateCompressAsync(): Maximum allowed chunk size for gdeflate is 64KB

Is >64KB only supported for deflate, not gdeflate?

Steps/Code to reproduce bug
GdeflateManager nvcomp_manager{ 1 << 18, nvcompBatchedGdeflateDefaultOpts, stream, NoComputeNoVerify };

Expected behavior
256KB uncomp_chunk_size should work.

Environment details (please complete the following information):

  • Environment location: Docker in GCP VM
  • Method of nvCOMP install: without extensions
@technillogue technillogue added ? - Needs Triage bug Something isn't working labels Jan 3, 2024
@akshaysubr
Copy link

@technillogue Yes, both Deflate and GDeflate currently don't support compression with chunk sizes >64KB. This was mainly an internal implementation decision to balance compression ratio, performance and temporary memory requirements. You can however use GDeflate CPU compression through libnvcomp_gdeflate_cpu.so to compress chunks with >64KB chunk sizes and use that with GPU decompression using the nvcomp LLIF. See the example here for details on how to do this.

At a higher level though, if you are trying to increase compression ratio, you might actually see more benefit by changing the compression options rather than the chunk size. Especially if you are interested in a "compress once, decompress multiple times" scenario, you can pass a non-default const nvcompBatchedGdeflateOpts_t& format_opts option to the GdeflateManager constructor. See this snippet from nvcomp/gdeflate.h for more details on what you can specify

/**
 * GDeflate compression options for the low-level API
 */
typedef struct
{
/**
 * Compression algorithm to use. Permitted values are:
 * 0 : high-throughput, low compression ratio (default)
 * 1 : low-throughput, high compression ratio
 * 2 : highest-throughput, entropy-only compression (use for symmetric compression/decompression performance)
 */
  int algo;
} nvcompBatchedGdeflateOpts_t;

static const nvcompBatchedGdeflateOpts_t nvcompBatchedGdeflateDefaultOpts = {0};

@technillogue
Copy link
Author

Is there any chance of using CPU compression with the HLIF?

Previously, only 0 was supported for HLIF, and in LLIF testing it seemed that 2 actually performed better than 1 for model weights, which benefit from entropy coding but not dictionary compression. Does 1 also increase entropy coding settings?

Copy link

github-actions bot commented Feb 3, 2024

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@akshaysubr
Copy link

Currently, we don't have a way of using CPU compression with HLIF unfortunately.

But if algo 2 is giving you the best results, I'm assuming CPU compression won't necessarily improve on compression ratio with larger chunk sizes. The entropy coding step is the same in all algos and the CPU compressor does not support pure entropy coding.

Copy link

github-actions bot commented Mar 7, 2024

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Copy link

github-actions bot commented Jun 5, 2024

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants