Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy Assets from MCP staging to production buckets #100

Closed
1 task done
smohiudd opened this issue Feb 2, 2024 · 8 comments
Closed
1 task done

Copy Assets from MCP staging to production buckets #100

smohiudd opened this issue Feb 2, 2024 · 8 comments
Assignees

Comments

@smohiudd
Copy link
Contributor

smohiudd commented Feb 2, 2024

What

To support a production instance, STAC assets that are currently in veda-data-store-staging must be copied to veda-data-store-production.

DAG in airflow to copies assets - confirm if its operational in dev or staging

PI Objective

Objective 4: Publish production data

Acceptance Criteria

  • All assets currently in veda-data-store-staging are available in veda-data-store
@smohiudd
Copy link
Contributor Author

Merged a PR to fix the transfer DAG: NASA-IMPACT/veda-data-airflow#121

@smohiudd
Copy link
Contributor Author

smohiudd commented Mar 13, 2024

Tested the following transfer in dev MWAA:

{
    "origin_bucket": "veda-data-store-staging",
    "origin_prefix": "geoglam/",
    "filename_regex": "^(.*).tif$",
    "target_bucket": "veda-data-store",
    "collection": "geoglam",
    "cogify": "false",
    "dry_run": "false"
}

I didn't get any errors in airflow. @anayeaye or @botanical when you get a chance can you check if this worked in MCP?

@smohiudd
Copy link
Contributor Author

Doing some testing and the airflow DAG can't work without appropriate PUT permission to veda-data-store. I know that vedaDataAccessRole has PUT permissions to:

            "Resource": [
                "arn:aws:s3:::veda-data-store-staging",
                "arn:aws:s3:::veda-data-store-staging/*"
            ]
        },

But do we know if there's a similar policy in MCP for veda-data-store?

@botanical
Copy link
Member

@smohiudd how would I check the dev MWAA transfer in MCP?

I see a role in MCP called veda-data-store-access that has:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BucketPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::veda-data-store"
            ]
        },
        {
            "Sid": "ObjectPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectVersionTagging",
                "s3:PutObjectTagging"
            ],
            "Resource": [
                "arn:aws:s3:::veda-data-store/*"
            ]
        }
    ]
}

and another role veda-data-store-access-staging that has

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BucketPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::veda-data-store-staging"
            ]
        },
        {
            "Sid": "ObjectPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectVersionTagging",
                "s3:PutObjectTagging",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::veda-data-store-staging/*"
            ]
        }
    ]
}

@smohiudd
Copy link
Contributor Author

@botanical the transfer I ran last night didn't work (you can check by seeing if there are files in the bucket). The DAG failed without error - handler needs some re work.

I ran another test today locally using a fixed handler and it did work for s3://veda-data-store/geoglam/

@anayeaye created a new role for us to use in the airflow transfer handler that should allow PUT operations to the veda-data-store bucket. New role is arn:aws:iam::114506680961:role/veda-data-manager

@botanical
Copy link
Member

I see 45 objects in veda-data-store/geoglam/ in MCP which were created around March 13, 2024, 10:34:43 (UTC-07:00) @smohiudd

@smohiudd
Copy link
Contributor Author

Another PR to fix the transfer util: NASA-IMPACT/veda-data-airflow#122

The transfer DAG is working in dev airflow and is ready to start moving assets. To do this programatically, the next step could be to create a script or notebook and runs the transfer DAG on each collection. The configs would be similar to the discovery items configs with a couple slight modifications.

@smohiudd
Copy link
Contributor Author

smohiudd commented Mar 16, 2024

I ran a transfer on Friday and it went OK. There are a few collections I need to rerun but I would say we're most of the way there.

These collections failed because of incorrect errors or config files and need to be run again:

  • ceos-co2-flux-budgets-mean
  • ceos-co2-flux-budgets
  • lis-global-da-swe
  • nceo_africa_2017 (bucket: nasa-maap-data-store. We don't want to transfer this asset and will run ingest as is using the nasa-maap-data-store bucket )
  • houston-lst-diff (bucket: climatedashboard-data, in UAH account and just one file so will do manual transfer)
  • facebook_population_density (larger than 5GB s3 transfer limit, manual transfer )

Also, below are special case collections which weren't part of the batch and will require manual transfers.

  • hls-l30-002-ej-reprocessed
  • hls-s30-002-ej-reprocessed

These datasets will be transferred at a later time:

  • ls8-covid-19-example-data
  • landsat-c2l2-sr-antarctic-glaciers-pine-island
  • landsat-c2l2-sr-lakes-aral-sea
  • landsat-c2l2-sr-lakes-tonle-sap
  • landsat-c2l2-sr-lakes-lake-balaton
  • landsat-c2l2-sr-lakes-vanern
  • landsat-c2l2-sr-antarctic-glaciers-thwaites
  • landsat-c2l2-sr-lakes-lake-biwa
  • combined_CMIP6_daily_GISS-E2-1-G_tas_kerchunk_DEMO

@smohiudd smohiudd self-assigned this Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants