Data Pipeline for Data Analysis built separately from CUBEMS environment
-
Go into dataflow folder
-
Set up virtual environment
python3 -m venv venv
- Activate the virtual environment
- Windows:
venv\Scripts\activate.bat
- Linux/Mac:
source venv/bin/activate
- Install required packages
pip3 install -r requirements.txt
- Setup
GOOGLE_APPLICATION_CREDENTIAL
environment variable
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
OR create .env
with the following content
GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
- Run pipeline. Check
direct-runner.py
for local pipeline template anddataflow-runner.py
for dataflow pipeline template
python3 pipeline-file.py
- Deploying Dataflow Template. The filename must be the same as template name and the file must follow the guideline https://cloud.google.com/dataflow/docs/guides/templates/creating-templates
python3 deploy.py --template [template name]
-
Go into functions folder
-
Install dependencies
yarn
- Deploy to Firebase
yarn deploy