-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFlow Job in TFX pipeline fails after running for an hour #6565
Comments
This is a known issue #6386 and the current workaround is to ssh to your container like |
Can you please suggest code and steps on how can I add "ENV RUN_PYTHON_SDK_IN_DEFAULT_ENVIRONMENT=1" in the TFX docker image before building the container. This is my code where I am creating runner BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [ PIPELINE_DEFINITION_FILE = 'test_pipeline.json' runner = tfx.orchestration.experimental.KubeflowV2DagRunner( |
Thanks for the solution @singhniraj08 , it worked and I am putting it here. !gcloud artifacts repositories create REPO-NAME !gcloud auth configure-docker REGION-docker.pkg.dev dockerfile_content = """ ENV RUN_PYTHON_SDK_IN_DEFAULT_ENVIRONMENT=1 with open("Dockerfile", "w") as dockerfile: !gcloud builds submit --tag REGION-docker.pkg.dev/PROJECT-ID/REPO-NAME/dataflow/DOCKERNAME:TAG and finally I passed this new custom docker image container in beam_pipeline_args BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [ |
We have a similar issue to track this issues and the long term solution for this issue is to add the environment variable to TFX base image to avoid these issues in future. This is blocked by other issue #6468. Once that issue is fixed, we will implement the environment variable in tFX base image. I would request you to close this issue and follow similar issue for update. |
thanks for the support |
If the bug is related to a specific library below, please raise an issue in the
respective repo directly:
System information
Interactive Notebook, Google Cloud, etc):
pip freeze
output):Describe the current behavior
Pipeline fails in the first step where it has to import data from BQ using Dataflow job
Describe the expected behavior
It should successfully import the data, as earlier
Standalone code to reproduce the issue
BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [
'--runner=DataflowRunner',
'--project=' + GOOGLE_CLOUD_PROJECT,
'--temp_location=' + os.path.join('gs://', GCS_BUCKET_NAME, 'tmp'),
'--region=' + GOOGLE_CLOUD_REGION,
]
Other info / logs
Logs attached
downloaded-logs-20240108-182510.csv
The text was updated successfully, but these errors were encountered: