Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cascaded Illegal Memory Access with High Compression Ratio #96

Open
NicolasDenoyelle opened this issue Feb 29, 2024 · 11 comments
Open

Comments

@NicolasDenoyelle
Copy link

Describe the bug

nvcomp cascaded manager runs into an illegal memory access when reading the compressed output size after compressing highly compressible data.

Steps/Code to reproduce bug

Reproducer: nvcomp-reproducer.cpp.txt

Compile with:

g++ -g -Wall -Wextra -Wall -std=c++17 -I/usr/local/cuda/include -I<path_to_nvcomp.h> nvcomp-reproducer.cpp -o reproducer -lcudart -lnvcomp && ./reproducer  

Expected behavior

The reproducer I attached compresses two buffers of the same size with the same cascaded manager.
One buffer is filled with random data and works fine.
The other buffer is filled with a sequence of {0,1,0,1,...} and triggers the error.
In the reproducer, I took care of making sure that buffer sizes are appropriate:

  • a multiple of the nvcompType_t being compressed
  • a multiple of the cascaded interface chunk sizes
  • at least the maximum compressed size given by nvcomp prior to compression.

Environment details:

  • Bare-metal
  • Cuda Driver Version: 535.154.05
  • CUDA Version: 12.2
  • NVIDIA GeForce RTX 3050
  • nvcomp Version: 3.0.4 (no extensions)
@NicolasDenoyelle NicolasDenoyelle added ? - Needs Triage bug Something isn't working labels Feb 29, 2024
@ndickson-nvidia
Copy link
Contributor

Unfortunately, it seems like I can't access the nvcomp-reproducer.cpp.txt file you linked to anymore. I think I looked at it briefly a week or so ago, so it was probably there at one point. Could you post it again? I think I might have found something, though I don't know whether it would cause this or not, without the repro code.

@NicolasDenoyelle
Copy link
Author

Below is the code.
I tried to run it again now with version 3.0.6 instead of 3.0.4 and the bug looks fixed.

#include <cuda_runtime.h>
#include <nvcomp/shared_types.h>
#include <cassert>
#include <cstdint>
#include <cstring>
#include <fstream>
#include <iostream>
#include <nvcomp.hpp>
#include <nvcomp/cascaded.hpp>
#include <nvcomp/nvcompManager.hpp>
#include <vector>

static void AbortWithError(const char* file, int line, const char* call,
                           const char* err) {
  std::cerr << file << ":" << line << " " << call << ": " << err << std::endl;
  abort();
}

#define ASSERT_CUDA_SUCCESS(call)                                                \
  do {                                                                           \
    const auto cudaResult = call;                                                \
    if (cudaResult != cudaSuccess) {                                             \
      AbortWithError(__FILE__, __LINE__, #call, cudaGetErrorString(cudaResult)); \
    }                                                                            \
  } while (0)

class DeviceBuffer {
 public:
  ~DeviceBuffer() {
    if (data_) {
      ASSERT_CUDA_SUCCESS(cudaFree(data_));
    }
  }
  DeviceBuffer() : data_(nullptr), size_(0), capacity_(0) {}
  DeviceBuffer(size_t size) : size_(size), capacity_(size) {
    ASSERT_CUDA_SUCCESS(cudaMalloc(&data_, size));
  }
  DeviceBuffer(const std::vector<char>& host_buffer) : DeviceBuffer(host_buffer.size()) {
    ASSERT_CUDA_SUCCESS(
        cudaMemcpy(data_, host_buffer.data(), size_, cudaMemcpyHostToDevice));
  }
  DeviceBuffer(const DeviceBuffer&) = delete;
  DeviceBuffer(DeviceBuffer&& other) : DeviceBuffer() { *this = std::move(other); }
  DeviceBuffer& operator=(const DeviceBuffer&) = delete;
  DeviceBuffer& operator=(DeviceBuffer&& other) {
    std::swap(data_, other.data_);
    std::swap(size_, other.size_);
    std::swap(capacity_, other.capacity_);
    return *this;
  }

  void resize(std::size_t size) {
    if (size <= capacity_) {
      size_ = size;
    } else {
      void* data;
      ASSERT_CUDA_SUCCESS(cudaMalloc(&data, size));
      ASSERT_CUDA_SUCCESS(cudaMemcpy(data, data_, size_, cudaMemcpyDeviceToDevice));
      ASSERT_CUDA_SUCCESS(cudaFree(data_));
      data_ = data;
      size_ = size;
      capacity_ = size;
    }
  }
  [[nodiscard]] size_t size() const { return size_; }
  [[nodiscard]] uint8_t* data() { return reinterpret_cast<uint8_t*>(data_); }
  [[nodiscard]] const uint8_t* data() const {
    return reinterpret_cast<const uint8_t*>(data_);
  }

  bool operator==(const DeviceBuffer& other) const {
    if (other.size_ != size_) {
      return false;
    }

    std::vector<char> lhs(size_, 0);
    std::vector<char> rhs(other.size_, 0);
    ASSERT_CUDA_SUCCESS(cudaMemcpy(lhs.data(), data_, size(), cudaMemcpyDeviceToHost));
    ASSERT_CUDA_SUCCESS(
        cudaMemcpy(rhs.data(), other.data_, other.size(), cudaMemcpyDeviceToHost));
    return lhs == rhs;
  }

 private:
  void* data_;
  size_t size_;
  size_t capacity_;
};

static void CompressionRoundTrip(nvcomp::nvcompManagerBase& manager,
                                 const DeviceBuffer& input,
                                 const size_t decomp_chunk_size) {
  auto decompressed = DeviceBuffer(input.size());
  std::cerr << "Compressing " << input.size() << " Bytes." << std::endl;

  auto compression_cfg = std::make_unique<nvcomp::CompressionConfig>(
      manager.configure_compression(input.size()));
  assert(*compression_cfg->get_status() == nvcompSuccess);

  DeviceBuffer compressed(
      std::max(decomp_chunk_size, compression_cfg->max_compressed_buffer_size));
  manager.compress(input.data(), compressed.data(), *compression_cfg);
  cudaStreamSynchronize(cudaStreamDefault);

  auto compressed_size = manager.get_compressed_output_size(compressed.data());
  assert(*compression_cfg->get_status() == nvcompSuccess);
  compressed.resize(compressed_size);

  std::cerr << "Compressed into " << compressed_size << " Bytes." << std::endl;

  auto decompression_cfg = std::make_unique<nvcomp::DecompressionConfig>(
      manager.configure_decompression(compressed.data()));
  assert(*decompression_cfg->get_status() == nvcompSuccess);
  manager.decompress(decompressed.data(), compressed.data(), *decompression_cfg);
  assert(decompressed == input);
}

static std::vector<char> MakeRegularInputData(const size_t size) {
  std::vector<char> data(size, 0);
  for (size_t i = 0; i < size; i++) {
    data[i] = i % 2;
  }
  return data;
}

static std::vector<char> MakeRandomInputData(const size_t size) {
  std::vector<char> data(size, 0);
  for (size_t i = 0; i < size; i++) {
    data[i] = rand() % 2;
  }
  return data;
}

int main() {
  constexpr size_t kSize = 1 << 20;
  constexpr size_t kDecompChunkSize = 4096;
  constexpr size_t kCompChunkSize = 4096;
  constexpr nvcompBatchedCascadedOpts_t opts = {kDecompChunkSize, NVCOMP_TYPE_SHORT, 2, 2,
                                                0};
  nvcomp::CascadedManager manager(kCompChunkSize, opts,
                                  static_cast<cudaStream_t>(cudaStreamDefault), 0,
                                  NoComputeNoVerify);

  CompressionRoundTrip(manager, DeviceBuffer(MakeRandomInputData(kSize)),
                       kDecompChunkSize); // Success
  CompressionRoundTrip(manager, DeviceBuffer(MakeRegularInputData(kSize)),
                       kDecompChunkSize); // Error
  return 0;
}

@ndickson-nvidia
Copy link
Contributor

It looks like this isn't fixed in the latest nvcomp, and it might just be a coincidence that it happens to no longer crash for you. The bug seems to occur when there's a chunk that consists entirely of 1 repeated value and there's at least 1 RLE pass and 2 delta passes. The RLE is run first, reducing it to a single value and a run length, and outputting the run length. The first delta just outputs the one value, because it has no following value to subtract it from. On the second delta pass, there are no values left, and there's a chance that it may crash. I'll try to fix the handling of the case where it runs out of values to process.

@NicolasDenoyelle
Copy link
Author

Ok,thanks for looking into this.
I am reopening the issue so you can close it when you think the bug is addressed.
I hope you can trigger the crash on your side. 🤞

Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@NicolasDenoyelle
Copy link
Author

@ndickson-nvidia I am attaching a reproducer here.
I was able to trigger again the same issue with a buffer filled with 0s and other nvcomp cascaded options.

Expected behaviour.

nvcomp-reproducer-cascaded.cpp.txt
Upon execution of the reproducer you should see the following error output:

Compressing 8192 Bytes.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Encountered Cuda Error: 700: 'an illegal memory access was encountered'.
Aborted (core dumped)

The same error occur with the following cascaded options:

{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 1, .num_deltas = 2, .use_bp = 0}
{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 1, .num_deltas = 2, .use_bp = 1}
{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 2, .num_deltas = 2, .use_bp = 0}
{.chunk_size = 4096, .type = NVCOMP_TYPE_INT, .num_RLEs = 2, .num_deltas = 2, .use_bp = 1}

Environment details:

Bare-metal
Cuda Driver Version: 535.171.04
CUDA Version: 12.2
NVIDIA GeForce RTX 3050
nvcomp Version: 3.0.6 (no extensions)

Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@ndickson-nvidia
Copy link
Contributor

As far as I know, I've fixed this bug for the next release. Thanks for reporting it!

@NicolasDenoyelle
Copy link
Author

Great!
I will close this as soon as I get a chance to try the next release.

Copy link

github-actions bot commented Jul 4, 2024

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Copy link

github-actions bot commented Oct 2, 2024

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants