-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dagster SQLite EventLog failures #3868
Comments
I may just be a superstitious pigeon on this but it seems like sometimes I get this failure 100% reliably, and other times it's maybe just a 50% chance, so if it fails, and I try to run the ETL again, it might work. But I don't know what influences the probability. Last night I tried to run it 10 times in a row, and they all failed within 2.5 minutes. Some things I've explored in the past without success:
|
Looking at the logging databases themselves:
|
I was able to work around the SQLite limitation by running postgres locally (using Postgres.app) and pointing Dagster at it with the same settings that we use in the nightly builds (after creating the specified database and user): storage:
postgres:
postgres_db:
username: dagster
password: dagster_password
hostname: 127.0.0.1
db_name: dagster
port: 5432 I brought up the independent "Too many open files" problem in this Dagster issue and it is getting some attention now, so maybe it'll get fixed. |
I'm very new to Dagster but would it be possible to wrap SQLite calls in a Mutex in the IOManager. That could resolve the too many files issue without dealing with the inevitable regressions involved in a DB migration to Postgres. |
The SQLite DB that's having trouble here isn't the one that we write our data to with an IOManager. It contains the event logging info from Dagster. The SQLite IOManger we use has a bunch of retry logic for when it hits accidental concurrent writes, so it seems to be doing okay. Thankfully it sounds like a fix to the too-many-files problem will show up in Dagster 1.9.3 later this week -- it was a resource leak on their side, where they weren't always closing files that got opened. |
Do we feel like we can close this now? |
The Too Many Open Files issue was fixed by dagster-io/dagster#25132 and I am currently able to run the DAG locally with Postgres for logging. If that's the way we need to run Dagster locally now, I think we should probably update our documentation to reflect that, otherwise folks will run into the initial DB issue I think? Are others able to run the whole DAG locally without using Postgres? |
As previously addressed in e.g. #2417, #2996, #3003, #3208, and #3211, SQLite can't handle multiple concurrent writes. Our SQLite IO Manager has worked around this for the PUDL DB, but we seem to be hitting a new limit of some kind with Dagster's event logging DB, which still uses SQLite locally by default.
@cmgosnell recently encountered the issue in attempting to debug some integration test failures through the Dagster UI, in which it failed basically 100% of the time. I have had the problem on and off with maybe a 50-75% failure rate on the full ETL.
For me, the failure always seems to happen right after the execution of the
raw_eia860m__all_dfs
asset.Full stack trace here:
This "unable to open database file" error seems somewhat different than the locked DB error that we were getting before due to attempted concurrent writes.
So, is there a new workaround to squeeze some more life out of SQLite? Or what is the easiest way to use Postgres locally for development?
DEBUG
level logging? I don't think we really look at it!The text was updated successfully, but these errors were encountered: