-
Notifications
You must be signed in to change notification settings - Fork 22
sv_squaredancer_0.1 error (workflow related, "instance" violates foreign key constraint "instance_peer_instance_id_fkey") #52
Comments
We are getting the same errors during 'sv_breakdancer' and 'indel pindel' steps of somatic-variation workflows. Something fairly fundamental is going wrong here... It is occurs consistently in these steps for any somatic-variation run within the standalone GMS |
Maybe this problem is related to the workflow database schema tables that were migrated from Oracle to Postgres for the purposes of releasing the standalone GMS... In particular the error is on a foreign key constraint in the 'instance' table of the workflow schema. This foreign key constraint is set up here in the psql dump: ALTER TABLE ONLY instance
ADD CONSTRAINT instance_peer_instance_id_fkey FOREIGN KEY (peer_instance_id) REFERENCES instance(workflow_instance_id); First question. Do we really want to create a foreign key constraint between 'peer_instance_id' of the 'instance' table and 'workflow_instance_id' of the same table? If we do want this, perhaps there is some compatibility issue between Oracle vs. PostGres self-referential integrity constraints? Of course the error could be something else entirely that is violating a desired foreign key constraint and the RDBMS is simply creating a useful error here... |
NOTES: To get more verbose output on interactions with the database try setting one of these when running a somatic-variation build: The first will dump updates and inserts, the second will dump all SQL statements. It is possible that this workflow foreign key constraint problem is related to the sorting of IDs at the time the insert is attempted and this causes a foreign key problem because in this case we have a self-referential foreign key constraint on the table "instance" of the schema "workflow". Both @amb43790 and @sakoht have reviewed the relevant UR code and at first glance this does not seem to be the case. However, it could still be something related to sorting prior to insert. To further test this theory @amb43790 changed the workflow code and related workflow tables in the postgres schema to use numeric IDs. We then dumped the database, switched to the modified workflow code and attempted a new build. The same foreign key constraint error still occurred. We next dropped the foreign key constraint entirely by logging into postgres on a test box and dropping the constraint as follows: sudo -u postgres psql -d genome ALTER TABLE workflow.instance DROP CONSTRAINT instance_peer_instance_id_fkey; This seems to have worked and allows pindel and breakdancer parallel jobs to launch and complete successfully. The question remains. Why is this foreign key constraint being violated in the standalone GMS but not within TGI where the same workflows are being created and stored? Within TGI, these tables are stored in Oracle. Could this be a difference between how the Oracle and Postgres RDBMSs handle self-referential foreign key constraints? I believe this is the only place in our schema with such a constraint. The current plan is to discuss this issue with TGI members more familiar with the workflow system, namely @davidlmorton |
To test possible resolutions to this problem we can log into clia1 and run a somatic-variation build as follows: ssh clia1 #Log into clia box In the standalone GMS, relevant software is installed here: Installation happens by running the Makefile here: The schema is here: All tables are stored in a single postgres database (no Oracle). You can log into this as follows: To rebuild the database and create new builds totally from scratch you can do: |
Looks like it was a casing-problem with the schema/table names. The schema creation script creates all the tables with lower cased names, but the table_name class attribute for the Workflow classes is mixed like "workflow.TABLE_NAME". So, when it was looking for the foreign key constraints for the workflow.instance table, it wasn't finding any, and the inserts were done in the wrong order. It looks like I've been able to fix it by changing the Workflow MetaDB info to always use lower case schema. table and column names, to match the schema build script, and edit the workflow class' table_name attribute to have everything lower cased (workflow.table_name). Build 02a06c952908411bbad14cf2f9552769 is running now. Many parallel steps have completed successfully. When it finishes (or dies from some unrelated problem), I'll push my fix for Workflow's MetaDB to the gms-pub branch of Workflow. |
Build 02a06c952908411bbad14cf2f9552769 crashed in its final step "Annotate And Upload Variants" because the annotator couldn't find a file it was looking for: /gscmnt/ams1100/info/v37_ucsc_conservation/chr1-rec I'd say the FK problem is fixed. I've pushed a commit to the gms-pub branch of the workflow repo |
Awesome! That is an unrelated issue described here: |
I see a similar issue with Pindel in the somatic-variation workflow,
|
On second thought, this might need a new issue actually. Opening #181 |
Note - The fix for this old issue is here, genome/tgi-workflow@28c525a |
The somatic variation build fails with this error in the SquareDancer step,
This error needs to be replicated in a fresh install of the sGMS.
The text was updated successfully, but these errors were encountered: