Skip to content

Commit

Permalink
Merge branch 'develop' into staging
Browse files Browse the repository at this point in the history
  • Loading branch information
zaro0508 committed Dec 14, 2024
2 parents 8456482 + 99374fc commit 24cf01c
Show file tree
Hide file tree
Showing 15 changed files with 194 additions and 91 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @Sage-Bionetworks/sagebio-it @Sage-Bionetworks/Agora-Admin
35 changes: 35 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: main

on:
pull_request:
branches: ['*']
push:
branches: ['develop', 'staging', 'prod' ]

jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pre-commit/[email protected]

deploy:
if: ${{ github.event_name == 'push' }}
needs: ["tests"]
# self hosted runner labels are setup in github to match branch names
runs-on: [self-hosted, "${{ github.ref_name }}"]
# variables in context environments are setup in github to match branch names
environment:
name: ${{ github.ref_name }}

steps:
# use older checkout version due to https://github.com/dawidd6/action-download-artifact/issues/261
- uses: actions/checkout@v2
- name: Import Synapse Data
run: ./import-data.sh $BRANCH $SYNAPSE_PASSWORD $DB_HOST $DB_USER $DB_PASS
env:
BRANCH: ${{ github.ref_name }}
SYNAPSE_PASSWORD: ${{ secrets.SYNAPSE_PASSWORD }}
DB_HOST: ${{ secrets.DB_HOST }}
DB_USER: ${{ secrets.DB_USER }}
DB_PASS: ${{ secrets.DB_PASS }}
19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/adrienverge/yamllint
rev: v1.33.0
hooks:
- id: yamllint
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.5.4
hooks:
- id: remove-tabs
- repo: https://github.com/sirosen/check-jsonschema
rev: 0.27.0
hooks:
- id: check-github-workflows
- id: check-github-actions
21 changes: 0 additions & 21 deletions .travis.yml

This file was deleted.

29 changes: 29 additions & 0 deletions .yamllint
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---

extends: default

rules:
braces:
level: warning
max-spaces-inside: 1
brackets:
level: warning
max-spaces-inside: 1
colons:
level: warning
commas:
level: warning
comments: disable
comments-indentation: disable
document-start: disable
empty-lines:
level: warning
hyphens:
level: warning
indentation:
level: warning
indent-sequences: consistent
line-length: disable
truthy: disable
new-line-at-end-of-file:
level: warning
115 changes: 96 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,125 @@
# Overview
Agora Data Manager is a tool that loads the JSON files into Agora's document database instances in our AWS environments.
Agora Data Manager is a tool that loads the JSON files into Agora's document database
instances in our AWS environments.

# Purpose
This project allows Agora maintainers to update the Agora database with
new versions of gene data from Synapse. This is a manually triggered,
self-service update.
self-service update.

# Execution

![alt text][db_update]

# Worflow
# Workflow

To deploy an updated data version to the Agora development database
1. Increment `data-version` in `data-manifest.json` on the `develop` branch.
2. Commit the change
3. The [CI system](https://travis-ci.org/Sage-Bionetworks/agora-data-manager) automatically updates the dev DB
3. The Github action CI system automatically updates the dev DB


To deploy an updated data version to the Agora staging database:
1. Merge the data-version update from the dev branch to the staging branch.
2. The [CI system](https://travis-ci.org/Sage-Bionetworks/agora-data-manager) automatically updates the staging DB
2. The Github action CI system automatically updates the dev DB

To deploy an updated data version to the Agora production database:
1. Merge the data-version update from the staging branch to the production branch.
2. The [CI system](https://travis-ci.org/Sage-Bionetworks/agora-data-manager) automatically updates the production DB
2. The Github action CI system automatically updates the dev DB


# Setup

The following environment variables need to be setup for the scripts to deploy database updates:
## Secrets

| Variable | Description | Example |
|----------------------|-----------------------------------|---------------------------------------------------------------------------|
| BASTIAN_HOST_develop | The bastian host | ec2-10-11-12-13.compute-1.amazonaws.com |
| DB_HOST_develop | The database host | dbcluster-mr0a782pfjnk.cluster-ctcayu3de2lt.us-east-1.docdb.amazonaws.com |
| DB_USER_develop | The database user | dbuser |
| DB_PASS_develop | The database password | supersecret |
| SYNAPSE_USERNAME | The Synapse service user | syn-service-user |
| SYNAPSE_PASSWORD | The Synapse service user password | supersecret |
The following secrets need to be setup in Github for the scripts to deploy database updates:

__Note__: The variables containing `_develop` postfix corresponds to the branch.
To deploy to a prod environment a prod branch is require along with a variable
containing a `_prod` prefix (i.e. BASTIAN_HOST_prod)
Global secrets:

| Variable | Description | Example |
|----------------------|----------------------------------|----------------------------------|
| SYNAPSE_PASSWORD | Synapse service user token (PAT) | glY4283tLQHZ...0eXAiOi...JKV1QiL |

[db_update]: diagram1.png "update diagram"

Context specific secrets for each environment that corresponds to a git branch (develop/staging/prod):

| Variable | Description | Example |
|-----------|-----------------------------|---------------------------------------------------------------------------|
| DB_HOST | The database host | dbcluster-mr0a782pfjnk.cluster-ctcayu3de2lt.us-east-1.docdb.amazonaws.com |
| DB_USER | The database user | dbuser |
| DB_PASS | The database password | supersecret |


![alt text][github_secrets]


## Self hosted runners

[agora2-infra] repository deploys a bastian host in AWS for each environment which have access to
the databases. We manually configure a [Github self-hosted runner](https://docs.github.com/en/actions/hosting-your-own-runners)
for each bastian host, a label is applied to each runner to match the corresponding git branch name (develop/staging/prod).
Each runner corresponds to an environment which corresponds to a git branch. The update is
executed from these runners. When a push happens on a branch (i.e. develop), the update
is executed on the `agora-bastian-develop` runner which in turn updates the development database.


![alt text][self_hosted_runners]


### Setup self hosted runners

Github self hosted runners are deployed with a [Sceptre template config file])(https://github.com/Sage-Bionetworks/agora2-infra/blob/main/config/agoradev/develop/agora-bastian.yaml).

Self Hosted Runner setup:
* Deploy the template to the Agora AWS account.
* Login to AWS console and goto `EC2 -> select the deployed instance -> Connect -> Session Manager -> Connect` to gain ssh access to the instance.
* Follow the instructions to install the [Github self hosted runner](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners#adding-a-self-hosted-runner-to-a-repository). We installed it to the `/home/ssm-user/actions-runner` folder.
* Run the `config.sh` script to configure the runner. !! Important !! Make sure to set the runner `name` and `label` corresponding to the desired deployment environment (develop/staging/prod)..
```text
sh-4.2$ pwd
/home/ssm-user/actions-runner
sh-4.2$ ./config.sh --url https://github.com/Sage-Bionetworks/agora-data-manager --token XXXXXXXXXXXXXXXXX6VLI
--------------------------------------------------------------------------------
| ____ _ _ _ _ _ _ _ _ |
| / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ |
| | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| |
| | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ |
| \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ |
| |
| Self-hosted runner registration |
| |
--------------------------------------------------------------------------------
# Authentication
√ Connected to GitHub
# Runner Registration
Enter the name of the runner group to add this runner to: [press Enter for Default]
Enter the name of runner: [press Enter for ip-10-XXX-XXX-XXX] agora-bastian-prod
This runner will have the following labels: 'self-hosted', 'Linux', 'X64'
Enter any additional labels (ex. label-1,label-2): [press Enter to skip] prod
√ Runner successfully added
√ Runner connection is good
# Runner settings
Enter name of work folder: [press Enter for _work]
√ Settings Saved.
```
* Setup the [GH runner agent to run as a service](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/configuring-the-self-hosted-runner-application-as-a-service)
* Run the agent and then check the [GH Runners page](https://github.com/Sage-Bionetworks/agora-data-manager/settings/actions/runners) to make sure that the runner is in `Idle` status.

[db_update]: agora-db-update.drawio.png "update diagram"
[github_secrets]: github_secrets.png "github secrets screen"
[self_hosted_runners]: self-hosted-runners.png "self hosted runners"
[agora2-infra]: https://github.com/Sage-Bionetworks/agora2-infra "agora2-infra repository"
[Github self-hosted runners]: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#about-self-hosted-runners
Binary file removed agora-ci-develop.pem.enc
Binary file not shown.
Binary file removed agora-ci-prod.pem.enc
Binary file not shown.
Binary file added agora-db-update.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion data-manifest.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"data-version": "66",
"data-version": "71",
"data-manifest-id": "syn13363290",
"team-images-id": "syn12861877"
}
Binary file removed diagram1.png
Binary file not shown.
Binary file added github_secrets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 13 additions & 14 deletions import-data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,39 @@
#!/bin/bash
set -e

TRAVIS_BRANCH=$1
SYNAPSE_USERNAME=$2
SYNAPSE_PASSWORD=$3
DB_HOST=$4
DB_USER=$5
DB_PASS=$6
BRANCH=$1
SYNAPSE_PASSWORD=$2
DB_HOST=$3
DB_USER=$4
DB_PASS=$5

CURRENT_DIR=$(pwd)
PARENT_DIR="$(dirname "$CURRENT_DIR")"
TMP_DIR=/tmp
WORKING_DIR=$TMP_DIR/work
WORKING_DIR=$CURRENT_DIR
DATA_DIR=$WORKING_DIR/data
TEAM_IMAGES_DIR=$DATA_DIR/team_images

mkdir -p $TEAM_IMAGES_DIR

# Version key/value should be on his own line
DATA_VERSION=$(cat $WORKING_DIR/data-manifest.json | grep data-version | head -1 | awk -F: '{ print $2 }' | sed 's/[",]//g' | tr -d '[[:space:]]')
DATA_MANIFEST_ID=$(cat $WORKING_DIR/data-manifest.json | grep data-manifest-id | head -1 | awk -F: '{ print $2 }' | sed 's/[",]//g' | tr -d '[[:space:]]')
TEAM_IMAGES_ID=$(cat $WORKING_DIR/data-manifest.json | grep team-images-id | head -1 | awk -F: '{ print $2 }' | sed 's/[",]//g' | tr -d '[[:space:]]')
echo "$TRAVIS_BRANCH branch, DATA_VERSION = $DATA_VERSION, manifest id = $DATA_MANIFEST_ID"
echo "$BRANCH branch, DATA_VERSION = $DATA_VERSION, manifest id = $DATA_MANIFEST_ID"

# Download the manifest file from synapse
synapse -u $SYNAPSE_USERNAME -p $SYNAPSE_PASSWORD get --downloadLocation $DATA_DIR -v $DATA_VERSION $DATA_MANIFEST_ID
synapse -p $SYNAPSE_PASSWORD get --downloadLocation $DATA_DIR -v $DATA_VERSION $DATA_MANIFEST_ID

# Ensure there's a newline at the end of the manifest file; otherwise the last listed file will not be downloaded
# echo >> $DATA_DIR/data_manifest.csv

# Download all files referenced in the manifest from synapse
cat $DATA_DIR/data_manifest.csv | tail -n +2 | while IFS=, read -r id version; do
echo Downloading $id,$version
synapse -u $SYNAPSE_USERNAME -p $SYNAPSE_PASSWORD get --downloadLocation $DATA_DIR -v $version $id ;
synapse -p $SYNAPSE_PASSWORD get --downloadLocation $DATA_DIR -v $version $id ;
done

# Download team images
synapse -u $SYNAPSE_USERNAME -p $SYNAPSE_PASSWORD get -r --downloadLocation $TEAM_IMAGES_DIR/ $TEAM_IMAGES_ID
synapse -p $SYNAPSE_PASSWORD get -r --downloadLocation $TEAM_IMAGES_DIR/ $TEAM_IMAGES_ID

echo "Data Files: "
ls -al $WORKING_DIR
Expand All @@ -65,7 +64,7 @@ mongoimport -h $DB_HOST -d agora -u $DB_USER -p $DB_PASS --authenticationDatabas
mongoimport -h $DB_HOST -d agora -u $DB_USER -p $DB_PASS --authenticationDatabase admin --collection genesbiodomains --jsonArray --drop --file $DATA_DIR/genes_biodomains.json
mongoimport -h $DB_HOST -d agora -u $DB_USER -p $DB_PASS --authenticationDatabase admin --collection biodomaininfo --jsonArray --drop --file $DATA_DIR/biodomain_info.json

mongo --host $DB_HOST -u $DB_USER -p $DB_PASS --authenticationDatabase admin $WORKING_DIR/create-indexes.js
mongosh --host $DB_HOST -u $DB_USER -p $DB_PASS --authenticationDatabase admin $WORKING_DIR/create-indexes.js

pushd $TEAM_IMAGES_DIR
ls -1r *.{jpg,jpeg,png} | while read x; do mongofiles -h $DB_HOST -d agora -u $DB_USER -p $DB_PASS --authenticationDatabase $DB_USER -v put $x; echo $x; done
Expand Down
Binary file added self-hosted-runners.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 0 additions & 36 deletions updatedb.sh

This file was deleted.

0 comments on commit 24cf01c

Please sign in to comment.