[BUG] Cascaded Illegal Memory Access with High Compression Ratio #96

NicolasDenoyelle · 2024-02-29T16:51:03Z

Describe the bug

nvcomp cascaded manager runs into an illegal memory access when reading the compressed output size after compressing highly compressible data.

Steps/Code to reproduce bug

Reproducer: nvcomp-reproducer.cpp.txt

Compile with:

g++ -g -Wall -Wextra -Wall -std=c++17 -I/usr/local/cuda/include -I<path_to_nvcomp.h> nvcomp-reproducer.cpp -o reproducer -lcudart -lnvcomp && ./reproducer

Expected behavior

The reproducer I attached compresses two buffers of the same size with the same cascaded manager.
One buffer is filled with random data and works fine.
The other buffer is filled with a sequence of {0,1,0,1,...} and triggers the error.
In the reproducer, I took care of making sure that buffer sizes are appropriate:

a multiple of the nvcompType_t being compressed
a multiple of the cascaded interface chunk sizes
at least the maximum compressed size given by nvcomp prior to compression.

Environment details:

Bare-metal
Cuda Driver Version: 535.154.05
CUDA Version: 12.2
NVIDIA GeForce RTX 3050
nvcomp Version: 3.0.4 (no extensions)

The text was updated successfully, but these errors were encountered:

ndickson-nvidia · 2024-03-12T22:15:26Z

Unfortunately, it seems like I can't access the nvcomp-reproducer.cpp.txt file you linked to anymore. I think I looked at it briefly a week or so ago, so it was probably there at one point. Could you post it again? I think I might have found something, though I don't know whether it would cause this or not, without the repro code.

NicolasDenoyelle · 2024-03-13T13:59:39Z

Below is the code.
I tried to run it again now with version 3.0.6 instead of 3.0.4 and the bug looks fixed.

#include <cuda_runtime.h>
#include <nvcomp/shared_types.h>
#include <cassert>
#include <cstdint>
#include <cstring>
#include <fstream>
#include <iostream>
#include <nvcomp.hpp>
#include <nvcomp/cascaded.hpp>
#include <nvcomp/nvcompManager.hpp>
#include <vector>

static void AbortWithError(const char* file, int line, const char* call,
                           const char* err) {
  std::cerr << file << ":" << line << " " << call << ": " << err << std::endl;
  abort();
}

#define ASSERT_CUDA_SUCCESS(call)                                                \
  do {                                                                           \
    const auto cudaResult = call;                                                \
    if (cudaResult != cudaSuccess) {                                             \
      AbortWithError(__FILE__, __LINE__, #call, cudaGetErrorString(cudaResult)); \
    }                                                                            \
  } while (0)

class DeviceBuffer {
 public:
  ~DeviceBuffer() {
    if (data_) {
      ASSERT_CUDA_SUCCESS(cudaFree(data_));
    }
  }
  DeviceBuffer() : data_(nullptr), size_(0), capacity_(0) {}
  DeviceBuffer(size_t size) : size_(size), capacity_(size) {
    ASSERT_CUDA_SUCCESS(cudaMalloc(&data_, size));
  }
  DeviceBuffer(const std::vector<char>& host_buffer) : DeviceBuffer(host_buffer.size()) {
    ASSERT_CUDA_SUCCESS(
        cudaMemcpy(data_, host_buffer.data(), size_, cudaMemcpyHostToDevice));
  }
  DeviceBuffer(const DeviceBuffer&) = delete;
  DeviceBuffer(DeviceBuffer&& other) : DeviceBuffer() { *this = std::move(other); }
  DeviceBuffer& operator=(const DeviceBuffer&) = delete;
  DeviceBuffer& operator=(DeviceBuffer&& other) {
    std::swap(data_, other.data_);
    std::swap(size_, other.size_);
    std::swap(capacity_, other.capacity_);
    return *this;
  }

  void resize(std::size_t size) {
    if (size <= capacity_) {
      size_ = size;
    } else {
      void* data;
      ASSERT_CUDA_SUCCESS(cudaMalloc(&data, size));
      ASSERT_CUDA_SUCCESS(cudaMemcpy(data, data_, size_, cudaMemcpyDeviceToDevice));
      ASSERT_CUDA_SUCCESS(cudaFree(data_));
      data_ = data;
      size_ = size;
      capacity_ = size;
    }
  }
  [[nodiscard]] size_t size() const { return size_; }
  [[nodiscard]] uint8_t* data() { return reinterpret_cast<uint8_t*>(data_); }
  [[nodiscard]] const uint8_t* data() const {
    return reinterpret_cast<const uint8_t*>(data_);
  }

  bool operator==(const DeviceBuffer& other) const {
    if (other.size_ != size_) {
      return false;
    }

    std::vector<char> lhs(size_, 0);
    std::vector<char> rhs(other.size_, 0);
    ASSERT_CUDA_SUCCESS(cudaMemcpy(lhs.data(), data_, size(), cudaMemcpyDeviceToHost));
    ASSERT_CUDA_SUCCESS(
        cudaMemcpy(rhs.data(), other.data_, other.size(), cudaMemcpyDeviceToHost));
    return lhs == rhs;
  }

 private:
  void* data_;
  size_t size_;
  size_t capacity_;
};

static void CompressionRoundTrip(nvcomp::nvcompManagerBase& manager,
                                 const DeviceBuffer& input,
                                 const size_t decomp_chunk_size) {
  auto decompressed = DeviceBuffer(input.size());
  std::cerr << "Compressing " << input.size() << " Bytes." << std::endl;

  auto compression_cfg = std::make_unique<nvcomp::CompressionConfig>(
      manager.configure_compression(input.size()));
  assert(*compression_cfg->get_status() == nvcompSuccess);

  DeviceBuffer compressed(
      std::max(decomp_chunk_size, compression_cfg->max_compressed_buffer_size));
  manager.compress(input.data(), compressed.data(), *compression_cfg);
  cudaStreamSynchronize(cudaStreamDefault);

  auto compressed_size = manager.get_compressed_output_size(compressed.data());
  assert(*compression_cfg->get_status() == nvcompSuccess);
  compressed.resize(compressed_size);

  std::cerr << "Compressed into " << compressed_size << " Bytes." << std::endl;

  auto decompression_cfg = std::make_unique<nvcomp::DecompressionConfig>(
      manager.configure_decompression(compressed.data()));
  assert(*decompression_cfg->get_status() == nvcompSuccess);
  manager.decompress(decompressed.data(), compressed.data(), *decompression_cfg);
  assert(decompressed == input);
}

static std::vector<char> MakeRegularInputData(const size_t size) {
  std::vector<char> data(size, 0);
  for (size_t i = 0; i < size; i++) {
    data[i] = i % 2;
  }
  return data;
}

static std::vector<char> MakeRandomInputData(const size_t size) {
  std::vector<char> data(size, 0);
  for (size_t i = 0; i < size; i++) {
    data[i] = rand() % 2;
  }
  return data;
}

int main() {
  constexpr size_t kSize = 1 << 20;
  constexpr size_t kDecompChunkSize = 4096;
  constexpr size_t kCompChunkSize = 4096;
  constexpr nvcompBatchedCascadedOpts_t opts = {kDecompChunkSize, NVCOMP_TYPE_SHORT, 2, 2,
                                                0};
  nvcomp::CascadedManager manager(kCompChunkSize, opts,
                                  static_cast<cudaStream_t>(cudaStreamDefault), 0,
                                  NoComputeNoVerify);

  CompressionRoundTrip(manager, DeviceBuffer(MakeRandomInputData(kSize)),
                       kDecompChunkSize); // Success
  CompressionRoundTrip(manager, DeviceBuffer(MakeRegularInputData(kSize)),
                       kDecompChunkSize); // Error
  return 0;
}

ndickson-nvidia · 2024-03-14T16:30:47Z

It looks like this isn't fixed in the latest nvcomp, and it might just be a coincidence that it happens to no longer crash for you. The bug seems to occur when there's a chunk that consists entirely of 1 repeated value and there's at least 1 RLE pass and 2 delta passes. The RLE is run first, reducing it to a single value and a run length, and outputting the run length. The first delta just outputs the one value, because it has no following value to subtract it from. On the second delta pass, there are no values left, and there's a chance that it may crash. I'll try to fix the handling of the case where it runs out of values to process.

NicolasDenoyelle · 2024-03-15T14:16:11Z

Ok,thanks for looking into this.
I am reopening the issue so you can close it when you think the bug is addressed.
I hope you can trigger the crash on your side. 🤞

github-actions · 2024-04-14T15:13:52Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

NicolasDenoyelle · 2024-04-26T12:19:55Z

@ndickson-nvidia I am attaching a reproducer here.
I was able to trigger again the same issue with a buffer filled with 0s and other nvcomp cascaded options.

Expected behaviour.

nvcomp-reproducer-cascaded.cpp.txt
Upon execution of the reproducer you should see the following error output:

Compressing 8192 Bytes.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Encountered Cuda Error: 700: 'an illegal memory access was encountered'.
Aborted (core dumped)

The same error occur with the following cascaded options:

{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 1, .num_deltas = 2, .use_bp = 0}
{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 1, .num_deltas = 2, .use_bp = 1}
{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 2, .num_deltas = 2, .use_bp = 0}
{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 2, .num_deltas = 2, .use_bp = 1}

Environment details:

Bare-metal
Cuda Driver Version: 535.171.04
CUDA Version: 12.2
NVIDIA GeForce RTX 3050
nvcomp Version: 3.0.6 (no extensions)

github-actions · 2024-05-26T13:01:23Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

ndickson-nvidia · 2024-05-27T16:09:15Z

As far as I know, I've fixed this bug for the next release. Thanks for reporting it!

NicolasDenoyelle · 2024-06-04T18:03:35Z

Great!
I will close this as soon as I get a chance to try the next release.

github-actions · 2024-07-04T19:01:28Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2024-10-02T20:01:27Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

NicolasDenoyelle added ? - Needs Triage bug Something isn't working labels Feb 29, 2024

NicolasDenoyelle closed this as completed Mar 13, 2024

NicolasDenoyelle reopened this Mar 15, 2024

github-actions bot added the inactive-30d label Apr 14, 2024

github-actions bot removed the inactive-30d label Apr 26, 2024

github-actions bot added the inactive-30d label May 26, 2024

github-actions bot removed the inactive-30d label May 27, 2024

github-actions bot added the inactive-30d label Jul 4, 2024

github-actions bot added the inactive-90d label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cascaded Illegal Memory Access with High Compression Ratio #96

[BUG] Cascaded Illegal Memory Access with High Compression Ratio #96

NicolasDenoyelle commented Feb 29, 2024

ndickson-nvidia commented Mar 12, 2024

NicolasDenoyelle commented Mar 13, 2024

ndickson-nvidia commented Mar 14, 2024

NicolasDenoyelle commented Mar 15, 2024

github-actions bot commented Apr 14, 2024

NicolasDenoyelle commented Apr 26, 2024

github-actions bot commented May 26, 2024

ndickson-nvidia commented May 27, 2024

NicolasDenoyelle commented Jun 4, 2024

github-actions bot commented Jul 4, 2024

github-actions bot commented Oct 2, 2024

[BUG] Cascaded Illegal Memory Access with High Compression Ratio #96

[BUG] Cascaded Illegal Memory Access with High Compression Ratio #96

Comments

NicolasDenoyelle commented Feb 29, 2024

ndickson-nvidia commented Mar 12, 2024

NicolasDenoyelle commented Mar 13, 2024

ndickson-nvidia commented Mar 14, 2024

NicolasDenoyelle commented Mar 15, 2024

github-actions bot commented Apr 14, 2024

NicolasDenoyelle commented Apr 26, 2024

Expected behaviour.

Environment details:

github-actions bot commented May 26, 2024

ndickson-nvidia commented May 27, 2024

NicolasDenoyelle commented Jun 4, 2024

github-actions bot commented Jul 4, 2024

github-actions bot commented Oct 2, 2024