You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
evidence-files comes directly from the PIS (no other dependencies)
pharmacogenomics-inputs are generated by platform ETL
As first step variant_to_vcf is run as a google batch job and second step list_nonannotated_variants is run as a standalone task (pythonOperator) can be submerged and run as a single dataproc step, we could decrese the complexity of the pipeline by removing the batch job and it's configuration, so it's more generic and easier to transfer to unified pipeline.
Extract the configuration of the steps and submerge as a unified pipeline config empowered by hydra. This will roll back the steps to the way how they were handled before in gentropy - see https://github.com/opentargets/gentropy/tree/v1.7.0/config, but decomplexified by only hosting the config for the genetics_etl steps.
This would mean that we could store one config per way how we run the pipeline:
default config for standalone execution mode
one config for unified pipeline that overrides the default config paths.
The text was updated successfully, but these errors were encountered:
Context
Genetics etl dag described by the image below
should be possible to execute in two modes:
Here is the list of possible improvements that I can see, can fit the
genetics_etl
dag to fit the above conditions:variant_to_vcf
andlist_nonannotated_variants
as a single dataproc stepCurrently
varaiant_to_vcf
step uses sources from the etl, namely:As first step
variant_to_vcf
is run as a google batch job and second steplist_nonannotated_variants
is run as a standalone task (pythonOperator) can be submerged and run as a single dataproc step, we could decrese the complexity of the pipeline by removing the batch job and it's configuration, so it's more generic and easier to transfer to unified pipeline.This would mean that we could store one config per way how we run the pipeline:
The text was updated successfully, but these errors were encountered: