Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKS Segmentation Fault During FastBuild #65

Open
ygrek opened this issue Jan 24, 2019 · 14 comments
Open

SKS Segmentation Fault During FastBuild #65

ygrek opened this issue Jan 24, 2019 · 14 comments
Labels
bug Something isn't working question Further information is requested

Comments

@ygrek
Copy link
Member

ygrek commented Jan 24, 2019

Original report by Anonymous.


During the initial fastbuild (or build) of the sks data the program immediate segfaults. The segfault appears to be from caml_dbenv_create based on the attached GDB dump. The system is running CentOS 7 with Berkeley DB 4.7.25 and ocaml 4.05.0 . I was trying to import the initial dump and the log shows:
2019-01-24 11:27:47 Opening log
2019-01-24 11:27:47 Running SKS 1.1.6+
2019-01-24 11:27:47 Opening KeyDB database

I do have an empty KDB directory left behind as well.

@ygrek
Copy link
Member Author

ygrek commented Jan 24, 2019

Original comment by Jason Schwarz (Bitbucket: jjschwarz, GitHub: jjschwarz).


I built the server with the current build from the repository patched with ocaml 4.05 patches

@ygrek
Copy link
Member Author

ygrek commented Jan 24, 2019

Original comment by Jason Schwarz (Bitbucket: jjschwarz, GitHub: jjschwarz).


Changed tactics and used the prebuilt binary from CentOS instead

@ygrek ygrek added major bug Something isn't working labels May 7, 2020
@ygrek
Copy link
Member Author

ygrek commented May 9, 2020

I don't see how ocaml can be at fault here.
Please see if the issues reproduces with :

#include <db.h>

int main()
{
  int err;
  int flags = 0;
  DB_ENV *dbenv;
  err = db_env_create(&dbenv,flags);
  return err;
}

compile and run:

gcc -ldb -o q q.c
./q
echo $?

@ygrek ygrek added question Further information is requested and removed major labels May 9, 2020
@MarcelWaldvogel
Copy link

Maybe a confirmation for the problem: (If not, take it as a potential workaround.)

I tried setting up a new Keyserver based on UGNS/sks-docker based on the current (2021-02-08) cyberbits.eu dump. sks build *.pgp consistently segfaulted after some 10 files. Continuing with sks merge *.pgp and sks pbuild --cache 1024 --diskcache 1024 seems to have resulted in a working system.

@skavadias
Copy link

skavadias commented May 4, 2021

On Ubuntu 18.04, with 64 GB of memory, sks build dump/*.pgp, without any other options, runs the furthest (loads 30+ files), before giving Fatal error: exception Stack overflow. With -n (tried values 5, 10, 100, 200) and -cache (tried values 100, 1024, 8192) there is always a segmentation fault, after loading some number of files. sks fastbuild -n 10 -cache 100 gives Fatal error: exception Stack overflow.
In my case, sks merge dump/*.pgp crashes as well, with Fatal error: exception Sys_error("1: No such file or directory").

These occur on the version provided by Ubuntu's distribution, but I have installed and built the code from here and sks build dump/*.pgp -n 10 -cache 100 still gives segmentation fault.

To overcome the Stack overflow/segmentation fault it appears ulimit -s 65536 does the trick (16384 is not enough -- I have not tried 32768). I would suggest adding a test and invocation of ulimit -s ... in file sks_build.sh.

Anyway, in my case, sks build dump/*.pgp now crashes --much later-- the same way as sks merge dump/*.pgp, with Fatal error: exception Sys_error("1: No such file or directory"). I don't know if anyone else has this, seemingly, new error, or this is something specific to my system.
[ Solved this as well: Somewhere I found an sksconf setting of disable_mailsync: 1. This cannot be; it can be just disable_mailsync:. The former creates this totally unexpected Fatal error: exception Sys_error("1: No such file or directory") ]

@ygrek ygrek changed the title SKS Segnentation Fault During FastBuild SKS Segmentation Fault During FastBuild May 6, 2021
@ygrek
Copy link
Member Author

ygrek commented May 6, 2021

@skavadias what version of sks? can you try with git master.
ref #79

@skavadias
Copy link

skavadias commented May 6, 2021

I am on Ubuntu 18.04, using SKS version 1.1.6. I might have had some problem with the behaviour of the command which and not actually used at any point the version I have built from source.

I suppose, you are interested in the need for a call to ulimit -s ...? If so, I will try it.
Because my other problem does not matter (I found disable_mailsync: 1 suggestion on the web, so possibly not for my version of SKS)....

@ygrek
Copy link
Member Author

ygrek commented May 6, 2021

I suppose, you are interested in the need for a call to ulimit -s ...?

yes, it is supposed to be fixed in git master and there should be no need to increase stack with it.

@skavadias
Copy link

skavadias commented May 6, 2021

Nop!
Did a fresh git clone https://github.com/SKS-Keyserver/sks-keyserver.git right now.

ulimit -s gives 8192. Running the installed (after make install) sks program like so

/full-path-here/sks build dump/*.pgp -basedir /full-path-here/sks-basedir -stdoutlog -n 100 -cache 1024

gives

2021-05-07 01:25:46 Running SKS 1.1.6+
2021-05-07 01:25:46 Opening KeyDB database
Loading keys...Fatal error: exception Stack overflow

Running with -n 10 I get

/full-path-here/sks build dump/*.pgp -basedir /full-path-here/sks-basedir -stdoutlog -n 10 -cache 100
2021-05-07 01:43:35 Running SKS 1.1.6+
2021-05-07 01:43:35 Opening KeyDB database
Loading keys...done
DB time:  0.34 min.  Total time: 0.37 min.
Loading keys...done
DB time:  0.14 min.  Total time: 0.16 min.
Loading keys...done
DB time:  0.15 min.  Total time: 0.16 min.
Loading keys...done
DB time:  0.17 min.  Total time: 0.18 min.
Loading keys...done
DB time:  0.15 min.  Total time: 0.16 min.
Loading keys...done
DB time:  0.16 min.  Total time: 0.17 min.
Loading keys...Fatal error: exception Stack overflow

@ygrek
Copy link
Member Author

ygrek commented May 7, 2021

any chance you can try commits from #79 ?

@skavadias
Copy link

Tell me which ones. I'll do one or two...

@ygrek
Copy link
Member Author

ygrek commented May 7, 2021

just switch to branch tailrec after updating git checkout (git remote update; git checkout tailrec)

@skavadias
Copy link

skavadias commented May 7, 2021

OK.
ulimit -s
8192

ca98434 ("KeyMerge: get rid of Stream.of_list which is not tail-recursive", 2020-06-24):

/full-path-here/sks build dump/*.pgp -basedir /full-path-here/sks-basedir -stdoutlog -n 10 -cache 100
... ...
Loading keys...Fatal error: exception Stack overflow

8af4cfc ("KeyMerge: tail-recursive flatten", 2020-06-24):
/full-path-here/sks build dump/*.pgp -basedir /full-path-here/sks-basedir -stdoutlog -n 10 -cache 100
seems to work ==> loaded keys 20 times now.

Stopping it to run with -n 100:
/full-path-here/sks build dump/*.pgp -basedir /full-path-here/sks-basedir -stdoutlog -n 100 -cache 1024

2021-05-07 17:28:43 Running SKS 1.1.6+
2021-05-07 17:28:43 Opening KeyDB database
Loading keys...done
DB time: 2.63 min. Total time: 2.82 min.
Loading keys...done
DB time: 4.21 min. Total time: 4.38 min.
Loading keys...done
DB time: 5.51 min. Total time: 7.25 min.
Loading keys...done
... ...

Keeps running, more than 15 minutes now (with memory usage now at 22 GB). I believe it is fixed in this version... (stopping it)

@ygrek
Copy link
Member Author

ygrek commented May 7, 2021

awesome, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants