Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent segfaults on Ubuntu 14.04 #14

Open
ipmb opened this issue Feb 13, 2015 · 12 comments
Open

Intermittent segfaults on Ubuntu 14.04 #14

ipmb opened this issue Feb 13, 2015 · 12 comments

Comments

@ipmb
Copy link

ipmb commented Feb 13, 2015

Not sure if this is the right place to raise this, but I've been unable to get the opencv_engine running reliably due to intermittent segfaults in the cv module taking down the whole thumbor process. It's nothing related to a specific file because I can retry and it works fine. I noticed this during load testing so it may be exacerbated during high load.

It took me a while to find the issue, there was nothing relevant in the thumbor logs, but I noticed 502s and dropped/prematurely closed connections in Nginx. Here's the relevant snippet from syslog:

Feb 12 21:48:30 ip-172-17-0-246 kernel: [  775.849915] init: thumbor_worker (1) main process (20820) killed by SEGV signal
Feb 12 21:48:30 ip-172-17-0-246 kernel: [  775.849930] init: thumbor_worker (1) main process ended, respawning
Feb 12 21:48:33 ip-172-17-0-246 kernel: [  779.087115] thumbor[20891]: segfault at 8 ip 00007faf3f436609 sp 00007fff1054b4e0 error 4 in cv2.so[7faf3f3ff000+14e000]

I tried with both the python-opencv package and building 2.4.10 from source, but saw the same results. Relevant Python packages:

opencv-engine==1.0.0
thumbor==4.8.6
@scorphus
Copy link
Member

Hello, @ipmb, sorry for taking so long to reply. Is this still an issue? Can you supply us with a bit more information such as what kind of operations and/or filters are involved? Also some important aspect of your configuration you think could help shedding a light on this?

@ipmb
Copy link
Author

ipmb commented Apr 14, 2015

Yeah, I had to stop using it because of the issue and I'm no longer working on the project, but I'd love to figure it out because I'm sure I'll use thumbor again in the future. The configuration was totally stock iirc. I was using the default optimizers and filters and pulling source images from S3 over HTTP. My guess is that it would be easy to replicate with the packages I mentioned on Ubuntu 14.04.

@guilhermef
Copy link
Member

We used to have this problem in our facial detector.
opencv is not really good when dealing with broken images,
and we had a few like this here.
So in our facial detector we had to go through Pillow before sending it to OpenCV: thumbor/remotecv@56c70ed
before that we it would always generate a segfault.

That's a broken image example:
http://s.glbimg.com/es/ge/f/original/2012/04/11/careca_lesao.jpg
It doesn't seem broken, but it's actually missing the last bytes.

@ipmb
Copy link
Author

ipmb commented Apr 14, 2015

These weren't broken images. I could repeat the same request and it would work fine. The problem seemed related to load/concurrency.

@guilhermef
Copy link
Member

If the images weren't broken, tornado could be finishing the request in the middle of it and feeding a broken image to the engine.

@guilhermef
Copy link
Member

We might be dealing with a broken pipe or a timeout the wrong way.

@guilhermef
Copy link
Member

I think these problems are related: thumbor/thumbor#208

@ipmb
Copy link
Author

ipmb commented Apr 29, 2015

A small percentage (0.15%) of our requests result in the 599 timeout error and we're also using S3 to store the source images so that would make sense.

@guilhermef
Copy link
Member

@ipmb Do you mind testing on this new version ? https://github.com/thumbor/thumbor/releases/tag/5.0.0rc2

@ipmb
Copy link
Author

ipmb commented Apr 29, 2015

Unfortunately I'm not working on the project anymore, but that might change in the future. I'll post results when/if that happens.

@mpdude
Copy link

mpdude commented Oct 19, 2017

I might be affected as well, using Ubuntu 14.04 and a pip install of Thumbor 5.2.1.

Syslog:

Oct 19 14:29:55 orecchiette kern.info kernel:[48479811.505031] thumbor[29974]: segfault at 8 ip 00007fcbf9aed609 sp 00007ffd51383880 error 4 in cv2.so[7fcbf9ab6000+14e000]
Oct 19 14:29:56 orecchiette user.warning kernel:[48479811.930534] init: thumbor-worker (8888) main process (29972) terminated with status 139
Oct 19 14:29:56 orecchiette user.warning kernel:[48479811.930560] init: thumbor-worker (8888) main process ended, respawning
Oct 19 14:29:56 orecchiette user.warning kernel:[48479811.984663] init: thumbor-worker (8888) main process (31472) terminated with status 2
Oct 19 14:29:56 orecchiette user.warning kernel:[48479811.984687] init: thumbor-worker (8888) main process ended, respawning
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.004162] init: thumbor-worker (8888) main process (31476) terminated with status 2
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.004187] init: thumbor-worker (8888) main process ended, respawning
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.063532] init: thumbor-worker (8888) main process (31480) terminated with status 2
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.063555] init: thumbor-worker (8888) main process ended, respawning
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.134177] init: thumbor-worker (8888) main process (31484) terminated with status 2
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.134200] init: thumbor-worker (8888) main process ended, respawning
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.175374] init: thumbor-worker (8888) main process (31489) terminated with status 2
Oct 19 14:29:56 orecchiette user.warning kernel:[48479812.175397] init: thumbor-worker (8888) respawning too fast, stopped

Are the exit codes 139 and 2 something designed into Thumbor? Or might they come from elsewhere? IIRC, errno = 2 is ENOENT?

Why does thumbor exit in the first place, and why does it have trouble starting up following that? If I restart it a few moments later, everything is fine.

What can I do to help here?

@christianjgreen
Copy link
Collaborator

@mpdude I have upgraded opencv to opencv3 with my PR. Currently using it in production with various other developers. Check it out! #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants