Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core dump when opening gz file #354

Closed
marilmanen opened this issue Jun 13, 2024 · 7 comments
Closed

core dump when opening gz file #354

marilmanen opened this issue Jun 13, 2024 · 7 comments

Comments

@marilmanen
Copy link
Contributor

I'm using following command to see what's inside a gz file

  nedit-ng -do 'replace_range(0,0,shell_command(gzip -cd test1.gz", ""))'

if the uncompressed file is 608Mbytes with 19M lines everything works fine, but with file size of 725Mbytes with 25M lines I get

terminate called after throwing an instance of 'std:bad_alloc'
  what(): std::bad_alloc
Abort (core dumped)

There is no issue with the bigger file if it's first uncompressed to a file and then I open the file with nedit-ng.
I have tested also old NEdit editor and there are no issues with it.

@eteran
Copy link
Owner

eteran commented Jun 13, 2024

that's very interesting. Does it matter which .gz file you try to open? Or is it one in particular? (If so, any chance you can somehow send me the offending file?).

I'll take a look, often a std::bad_alloc on a 64-bit system means that something was given a negative size somewhere :-/

@marilmanen
Copy link
Contributor Author

I modified the content of the spef file (replace all word characters to x) and it had no impact, so it looks like the only thing that matters is the size of the file. I also created a dummy file by duplicating following sequence until the uncompressed file was >728Mbytes and with it the I get the crash. After a couple iterations it looks like the limit for crash is very close to 715Mbytes

Here is the content that I used

*xxx

*xxxxx *xxxxxxx x.xxxxxxxxxx //xxxxxx x.xxx xxxxxx x.xxxxxxxxxx xx

*xxxx
*x *xxxxxx:xx x *x x *x xx.xxx xxx.xxx
*x *xxxxxx:x x *x x.xxxxxxxxx *x xx.xxx xxx.xxx
*x *xxxxxxx:x *x xx.xxx xxx.xxx
*x *xxxxxxx:x *x xx.xxx xxx.xxx
*x *xxxxxxx:x *x xx.xxx xxx.xxx

*xxx
x *xxxxxx:xx xx-xx
x *xxxxxx:x x.xxxxxxxxxx
x *xxxxxxx:x xx-xx
x *xxxxxxx:x x.xxxxxxxxxx
x *xxxxxxx:x x.xxxxxxx-xx
xx *xxxxxx:x *xxxxxxx:xx x.xxxxxxx-xx
xx *xxxxxx:x *xxxxxxx:xx x.xxxxxx-xx
xx *xxxxxx:x *xxxxxx:x x.xxxxx-xx
xx *xxxxxx:x *xxxxxx:x x.xxxxxxx-xx
xx *xxxxxx:x *xxxxxxx:x x.xxxxxx-xx


@eteran
Copy link
Owner

eteran commented Jun 19, 2024

So, i made a script like this:

#!/bin/bash


IN=test.txt
OUT=test
FILE=test.gz
MINSIZE=728000000
COUNT=0

truncate -s 0 $OUT
while [[ 1 ]]; do
    echo $COUNT

    # append the source file 81920 times
    cat $(yes $IN | head -81920) >> $OUT

    gzip -f $OUT -c > $FILE

    SIZE=$(wc -c <"$FILE")
    if [ $SIZE -ge $MINSIZE ]; then
        echo size is over $MINSIZE bytes
        exit 0
    fi

    COUNT=$((COUNT+1))
done

To try to test, and I have a couple of questions:

  1. is this generally what you meant?
  2. how big is the source file? because a repeated pattern like that compresses particular well, I've gotten the uncompressed file over 1GB an it only results in a 3MB tgz file!

@eteran
Copy link
Owner

eteran commented Jun 20, 2024

OK, never mind that last comment. I mis-read it and thought that the .gz file needed to be a certain size, not the source file. I've replicated the issue and will see if I can fix it ASAP :-)

@eteran
Copy link
Owner

eteran commented Jun 20, 2024

This is an interesting situation. it may not be obvious at first, but this is actually running into a circumstance where we are hitting the memory limit of what can be held in a QString.

I was able to reproduce this with a very trivial Qt application that looks like this:

#include <QByteArray>
#include <QFile>
#include <QString>
#include <QtDebug>

int main() {

    QFile file(QLatin1String("test.txt"));
    if (file.open(QIODevice::ReadOnly)) {
        QByteArray bytes = file.readAll();
        QString text = QString::fromLocal8Bit(bytes.data(), bytes.size());
        qDebug() << text;
    }
}

with a file.txt that is 1854668800 bytes big. Fundamentally, QString is limited to ~2GB of storage, and the number of characters is at best half of that because they use UTF-16 (it can be less than half due to combining characters and similar). I know you triggered it with a smaller file, but I think it's essentially the same issue because some QString operations require even more space temporarily.

Soi, back to nedit-ng. Fortunately, we don't actually use QString for file data that often, but we do currently use it for capturing stdout and stderr of subprocesses. In this case, I read the results of command you ran (in this case, gzip) into a byte array, and then because it could be UTF-8, I decode it into a QString (this is where it blows up), and finally, if all goes well, I convert it to a character buffer as needed.

I'll have to refactor the code to use a different approach since QString has this limitation. I'll update when I have it worked out.

eteran added a commit that referenced this issue Jun 24, 2024
…sing a QString as much as possible.

this is necessary because, at least in Qt5, there is a hard upper limit to the length of a QString
addesses issue #354
@eteran
Copy link
Owner

eteran commented Jun 24, 2024

@marilmanen I believe that this PR should fix the issue, if it does, please let me know and I'll merge it into master.
Thanks!

#355

@marilmanen
Copy link
Contributor Author

I tested with couple large files and no issues, so it looks like you have fixed the issue.
Great!

eteran added a commit that referenced this issue Jun 25, 2024
…sing a QString as much as possible. (#355)

this is necessary because, at least in Qt5, there is a hard upper limit to the length of a QString
addesses issue #354
@eteran eteran closed this as completed Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants