Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check potential bugs in compressed_files.py in lasse-py when b=1 and 3 bits #7

Open
aldebaro opened this issue Apr 9, 2023 · 2 comments

Comments

@aldebaro
Copy link
Owner

aldebaro commented Apr 9, 2023

I tested it and worked in most cases. It is not working for n=1 and 3 in the code below:

#it works for n=2, 4, 5 and 6 and 7

but it does not work for n=1 and 3

n = 7 # Minimum number of bits to represent the numbers, n<8
x = randint(low=0, high=2**n, size=100, dtype=np.uint8)
compressed = compact_bytes(x, n)
filename = "compressed.bin"
compressed.tofile(filename)
uncompressed = decompact_bytes(compressed, n)
filename = "uncompressed.bin"
uncompressed.tofile(filename)
uncompressed2 = np.fromfile(filename, dtype=np.uint8, count=-1)

@claudio966
Copy link
Contributor

claudio966 commented Apr 10, 2023

I had debugged this code and the problem seems to be in the logic to figure out the quantities of the numbers from original list.

def decompact_bytes(input_array, num_bits):
    if num_bits >= 8:
        raise ValueError("This function is meant to work with less than 8 bits!")

    output_arr_len = (
        len(input_array)
        * 8  # Figure out how many numbers were on the original array, which should be the uncompressed output
    ) // num_bits

I quick example:
When we use 1 as the number of the bits, and the length of the compress array is 13, when this length is multiplied by 8 and divide by 1, the result of this evaluation is a number greater than 100. Remembering, the input array generated has length equal 100. So, the output array size won't be equal to the original.

(input_array * 8)/1 > 100

Naturally, the same problem occurs when num_bits is equal 3, but the output length is smaller than the original.

For other cases, the result is close to 100(with approximation we get the 100) or exactly 100.

@EduardoGFilho
Copy link

@claudio966 That's right, I have encountered the same. I could reproduce the error for arrays with length N*lmc - 1, where lcm is the least commom multiple of num_bits and 8. I could "fix" this problem by pre-calculating the error in that formula, saving it in the first byte of the array and using that info to correct the formula's output. Since this takes up more space I haven't made any pull request yet, but couldn't think of another way of consistently correcting the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants