core dump when opening gz file #354

marilmanen · 2024-06-13T09:06:38Z

I'm using following command to see what's inside a gz file

  nedit-ng -do 'replace_range(0,0,shell_command(gzip -cd test1.gz", ""))'

if the uncompressed file is 608Mbytes with 19M lines everything works fine, but with file size of 725Mbytes with 25M lines I get

terminate called after throwing an instance of 'std:bad_alloc'
  what(): std::bad_alloc
Abort (core dumped)

There is no issue with the bigger file if it's first uncompressed to a file and then I open the file with nedit-ng.
I have tested also old NEdit editor and there are no issues with it.

The text was updated successfully, but these errors were encountered:

eteran · 2024-06-13T18:44:08Z

that's very interesting. Does it matter which .gz file you try to open? Or is it one in particular? (If so, any chance you can somehow send me the offending file?).

I'll take a look, often a std::bad_alloc on a 64-bit system means that something was given a negative size somewhere :-/

marilmanen · 2024-06-14T03:54:58Z

I modified the content of the spef file (replace all word characters to x) and it had no impact, so it looks like the only thing that matters is the size of the file. I also created a dummy file by duplicating following sequence until the uncompressed file was >728Mbytes and with it the I get the crash. After a couple iterations it looks like the limit for crash is very close to 715Mbytes

Here is the content that I used

*xxx

*xxxxx *xxxxxxx x.xxxxxxxxxx //xxxxxx x.xxx xxxxxx x.xxxxxxxxxx xx

*xxxx
*x *xxxxxx:xx x *x x *x xx.xxx xxx.xxx
*x *xxxxxx:x x *x x.xxxxxxxxx *x xx.xxx xxx.xxx
*x *xxxxxxx:x *x xx.xxx xxx.xxx
*x *xxxxxxx:x *x xx.xxx xxx.xxx
*x *xxxxxxx:x *x xx.xxx xxx.xxx

*xxx
x *xxxxxx:xx xx-xx
x *xxxxxx:x x.xxxxxxxxxx
x *xxxxxxx:x xx-xx
x *xxxxxxx:x x.xxxxxxxxxx
x *xxxxxxx:x x.xxxxxxx-xx
xx *xxxxxx:x *xxxxxxx:xx x.xxxxxxx-xx
xx *xxxxxx:x *xxxxxxx:xx x.xxxxxx-xx
xx *xxxxxx:x *xxxxxx:x x.xxxxx-xx
xx *xxxxxx:x *xxxxxx:x x.xxxxxxx-xx
xx *xxxxxx:x *xxxxxxx:x x.xxxxxx-xx

eteran · 2024-06-19T16:20:33Z

So, i made a script like this:

#!/bin/bash


IN=test.txt
OUT=test
FILE=test.gz
MINSIZE=728000000
COUNT=0

truncate -s 0 $OUT
while [[ 1 ]]; do
    echo $COUNT

    # append the source file 81920 times
    cat $(yes $IN | head -81920) >> $OUT

    gzip -f $OUT -c > $FILE

    SIZE=$(wc -c <"$FILE")
    if [ $SIZE -ge $MINSIZE ]; then
        echo size is over $MINSIZE bytes
        exit 0
    fi

    COUNT=$((COUNT+1))
done

To try to test, and I have a couple of questions:

is this generally what you meant?
how big is the source file? because a repeated pattern like that compresses particular well, I've gotten the uncompressed file over 1GB an it only results in a 3MB tgz file!

eteran · 2024-06-20T00:01:43Z

OK, never mind that last comment. I mis-read it and thought that the .gz file needed to be a certain size, not the source file. I've replicated the issue and will see if I can fix it ASAP :-)

eteran · 2024-06-20T18:18:58Z

This is an interesting situation. it may not be obvious at first, but this is actually running into a circumstance where we are hitting the memory limit of what can be held in a QString.

I was able to reproduce this with a very trivial Qt application that looks like this:

#include <QByteArray>
#include <QFile>
#include <QString>
#include <QtDebug>

int main() {

    QFile file(QLatin1String("test.txt"));
    if (file.open(QIODevice::ReadOnly)) {
        QByteArray bytes = file.readAll();
        QString text = QString::fromLocal8Bit(bytes.data(), bytes.size());
        qDebug() << text;
    }
}

with a file.txt that is 1854668800 bytes big. Fundamentally, QString is limited to ~2GB of storage, and the number of characters is at best half of that because they use UTF-16 (it can be less than half due to combining characters and similar). I know you triggered it with a smaller file, but I think it's essentially the same issue because some QString operations require even more space temporarily.

Soi, back to nedit-ng. Fortunately, we don't actually use QString for file data that often, but we do currently use it for capturing stdout and stderr of subprocesses. In this case, I read the results of command you ran (in this case, gzip) into a byte array, and then because it could be UTF-8, I decode it into a QString (this is where it blows up), and finally, if all goes well, I convert it to a character buffer as needed.

I'll have to refactor the code to use a different approach since QString has this limitation. I'll update when I have it worked out.

…sing a QString as much as possible. this is necessary because, at least in Qt5, there is a hard upper limit to the length of a QString addesses issue #354

eteran · 2024-06-24T18:30:58Z

@marilmanen I believe that this PR should fix the issue, if it does, please let me know and I'll merge it into master.
Thanks!

#355

marilmanen · 2024-06-25T03:21:19Z

I tested with couple large files and no issues, so it looks like you have fixed the issue.
Great!

…sing a QString as much as possible. (#355) this is necessary because, at least in Qt5, there is a hard upper limit to the length of a QString addesses issue #354

eteran mentioned this issue Jun 24, 2024

reworking code that takes data from sub-processes so that it avoids using a QString as much as possible. #355

Merged

eteran closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core dump when opening gz file #354

core dump when opening gz file #354

marilmanen commented Jun 13, 2024

eteran commented Jun 13, 2024

marilmanen commented Jun 14, 2024

eteran commented Jun 19, 2024

eteran commented Jun 20, 2024

eteran commented Jun 20, 2024

eteran commented Jun 24, 2024

marilmanen commented Jun 25, 2024

core dump when opening gz file #354

core dump when opening gz file #354

Comments

marilmanen commented Jun 13, 2024

eteran commented Jun 13, 2024

marilmanen commented Jun 14, 2024

eteran commented Jun 19, 2024

eteran commented Jun 20, 2024

eteran commented Jun 20, 2024

eteran commented Jun 24, 2024

marilmanen commented Jun 25, 2024