The ETL System we use is based on the DGP-APP Platform
configuration.json
- the main DGP app configuration filesrm_tools/
- a utility python package, for common code and tools not specific for a single operatorevents/
- DGP Event handlers (TBD)operators/
- The specific pipeline operators code
Authentication
EXTERNAL_ADDRESS
: External address of the website (used to set auth callback correctly)GOOGLE_KEY
: Credentials key for oauth2 authenticationGOOGLE_SECRET
: Credentials secret for oauth2 authenticationDGP_APP_DEFAULT_ROLE
: Set to1
, disallowing any anonymous accessPUBLIC_KEY
&PRIVATE_KEY
: PEM encoded RSA key pair, used to encode JWT for the client
Source file storage:
BUCKET_NAME
,AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
,AWS_REGION
,S3_ENDPOINT_URL
: The usual meaning
Databases:
DATABASE_URL
: Connection string for theauth
databaseDATASETS_DATABASE_URL
: Connection string for thedatasets
databaseETLS_DATABASE_URL
: Connection string for theetls
databaseAIRFLOW__DATABASE__SQL_ALCHEMY_CONN
: Connection string for theairflow
database
Scraper Specific:
See .env.example
for a full list of scraper-specific environment variables.