panoptes · wtgee · Mar 17, 2019 · Mar 17, 2019 · Mar 17, 2019 · Mar 17, 2019
diff --git a/cf-get-state/README.md → cf-get-observation-state/README.md b/cf-get-state/README.md → cf-get-observation-state/README.md
@@ -1,37 +1,36 @@
-Get Sequence/Image State
-========================
+Get Observation State
+=====================
 
 This folder defines a [Google Cloud Function](https://cloud.google.com/functions/).
 
 Small helper function to lookup the `state` column on either a sequence or an
-image in the `metadata` db.
+image in the `metadata.observations` db.
 
-Endpoint: https://us-central1-panoptes-survey.cloudfunctions.net/get-state
+Endpoint: https://us-central1-panoptes-survey.cloudfunctions.net/get-observation-state
 
 Can be passed either a `sequence_id` or an `image_id`.
 
 Payload: JSON message of the form:
 
-    ```json
-    {
-        'state': str,
-        'sequence_id': str,
-        'image_id': str
-    }
-    ```
+```json
+{
+    'sequence_id': str,
+    'image_id': str
+}
+```
 
 Deploy
 ------
 
 [Google Documentation](https://cloud.google.com/functions/docs/deploying/filesystem)
 
 From the directory containing the cloud function. The `entry_point` is the
-name of the function in `main.py` that we want called and `header-to-db`
+name of the function in `main.py` that we want called and `get-observation-state`
 is the name of the Cloud Function we want to create.
 
 ```bash
 gcloud functions deploy \
-                 get-state \
+                 get-observation-state \
                  --entry-point get-state \
                  --runtime python37 \
                  --trigger-http

diff --git a/cf-get-state/deploy.sh → cf-get-observation-state/deploy.sh b/cf-get-state/deploy.sh → cf-get-observation-state/deploy.sh
@@ -1,9 +1,9 @@
 #!/bin/bash -e
 
-echo "Deploying cloud function: cf-get-state"
+echo "Deploying cloud function: cf-get-observation-state"
 
 gcloud functions deploy \
-                 get-state \
+                 get-observation-state \
                  --entry-point get_state \
                  --runtime python37 \
                  --trigger-http
diff --git a/cf-get-state/main.py → cf-get-observation-state/main.py b/cf-get-state/main.py → cf-get-observation-state/main.py
@@ -5,16 +5,13 @@
 from psycopg2 import OperationalError
 from psycopg2.pool import SimpleConnectionPool
 
-PROJECT_ID = os.getenv('POSTGRES_USER', 'panoptes-survey')
-BUCKET_NAME = os.getenv('BUCKET_NAME', 'panoptes-survey')
-
 CONNECTION_NAME = os.getenv(
     'INSTANCE_CONNECTION_NAME',
-    'panoptes-survey:us-central1:panoptes-meta'
+    'panoptes-exp:us-central1:panoptes-metadata'
 )
-DB_USER = os.getenv('POSTGRES_USER', 'panoptes')
-DB_PASSWORD = os.getenv('POSTGRES_PASSWORD', None)
-DB_NAME = os.getenv('POSTGRES_DATABASE', 'metadata')
+DB_USER = os.getenv('DB_USER', 'panoptes')
+DB_PASSWORD = os.getenv('DB_PASSWORD', None)
+DB_NAME = os.getenv('DB_NAME', 'observations')
 
 pg_config = {
     'user': DB_USER,

diff --git a/cf-get-state/requirements.txt → cf-get-observation-state/requirements.txt b/cf-get-state/requirements.txt → cf-get-observation-state/requirements.txt
diff --git a/cf-image-received/main.py b/cf-image-received/main.py
@@ -21,7 +21,7 @@ def image_received(request):
     Triggered when file is uploaded to bucket.
 
     FITS: Set header variables and then forward to endpoint for adding headers
-    to the metadatabase. The header is looked up from the file id, including the
+    to the metadatabase. The header is lokoed up from the file id, including the
     storage bucket file generation id, which are stored into the headers.
 
     CR2: Trigger creation of timelapse and jpg images.

diff --git a/cf-observation-psc-created/README.md b/cf-observation-psc-created/README.md
@@ -0,0 +1,41 @@
+File Upload to Bucket Storage
+=============================
+
+This folder defines a [Google Cloud Function](https://cloud.google.com/functions/).
+
+Triggered when a new PSC file is uploaded for an observation.
+
+The Observation PSC is created by the `df-make-observation-pc` job, which will
+upload a CSV file to the `panoptes-observation-psc` bucket. This CF will listen
+to that bucket and process new files:
+
+    1. Update the `metadata.sequences` table with the RA/Dec boundaries for
+    the sequence.
+    2. Send a PubSub message to the `find-similar-sources` topic to trigger
+    creation of the similar sources.
+
+Endpoint: No public endpoint
+
+
+Deploy
+------
+
+[Google Documentation](https://cloud.google.com/functions/docs/deploying/filesystem)
+
+> :bulb: There is also a small convenience script called `deploy.sh` that does the same thing. 
+```bash
+./deploy.sh
+```
+
+From the directory containing the cloud function. The `entry_point` is the
+name of the function in `main.py` that we want called and `observation-psc-created`
+is the name of the Cloud Function we want to create.
+
+```bash
+gcloud functions deploy \
+                 observation-psc-created \
+                 --entry-point observation_psc_created \
+                 --runtime python37 \
+                 --trigger-resource panoptes-observation-psc \
+                 --trigger-event google.storage.object.finalize
+```
diff --git a/cf-observation-psc-created/deploy.sh b/cf-observation-psc-created/deploy.sh
@@ -0,0 +1,8 @@
+#!/bin/bash -e
+
+gcloud functions deploy \
+                 observation-psc-created \
+                 --entry-point observation_psc_created \
+                 --runtime python37 \
+                 --trigger-resource panoptes-observation-psc \
+                 --trigger-event google.storage.object.finalize
diff --git a/cf-observation-psc-created/main.py b/cf-observation-psc-created/main.py
@@ -0,0 +1,53 @@
+import os
+import re
+import requests
+
+from flask import jsonify
+from google.cloud import pubsub
+
+PROJECT_ID = os.getenv('PROJECT_ID', 'panoptes-survey')
+
+publisher = pubsub.PublisherClient()
+PUB_TOPIC = os.getenv('PUB_TOPIC', 'find-similar-sources')
+pubsub_topic = f'projects/{PROJECT_ID}/topics/{PUB_TOPIC}'
+
+update_state_url = os.getenv(
+    'HEADER_ENDPOINT',
+    'https://us-central1-panoptes-survey.cloudfunctions.net/update-state'
+)
+
+
+def observation_psc_created(data, context):
+    """ Triggered when a new PSC file is uploaded for an observation.
+
+    The Observation PSC is created by the `df-make-observation-pc` job, which will
+    upload a CSV file to the `panoptes-observation-psc` bucket. This CF will listen
+    to that bucket and process new files:
+        1. Update the `metadata.sequences` table with the RA/Dec boundaries for
+        the sequence.
+        2. Send a PubSub message to the `find-similar-sources` topic to trigger
+        creation of the similar sources.
+
+    """
+
+    object_id = data['id']
+
+    matches = re.match('panoptes-observation-psc/(PAN.{3}[/_].*[/_]20.{6}T.{6}).csv/*', object_id)
+    if matches is not None:
+        sequence_id = matches.group(1)
+        print(f'Found sequence_id {sequence_id}')
+    else:
+        msg = f"Cannot find matching sequence_id in {object_id}"
+        print(msg)
+        return jsonify(success=False, msg=msg)
+
+    # Update state
+    state = 'observation_psc_created'
+    print(f'Updating state for {sequence_id} to {state}')
+    requests.post(update_state_url, json={'sequence_id': sequence_id, 'state': state})
+
+    publisher.publish(pubsub_topic,
+                      b'cf-observation-psc-created finished',
+                      sequence_id=sequence_id)
+
+    return jsonify(success=True, msg="Received file: {object_id}")
diff --git a/cf-observation-psc-created/requirements.txt b/cf-observation-psc-created/requirements.txt
@@ -0,0 +1,3 @@
+Flask
+google-cloud-pubsub
+requests
diff --git a/cf-update-state/README.md → cf-update-observation-state/README.md b/cf-update-state/README.md → cf-update-observation-state/README.md
@@ -4,35 +4,35 @@ Get Sequence/Image State
 This folder defines a [Google Cloud Function](https://cloud.google.com/functions/).
 
 Small helper function to update the `state` column on either a sequence or an
-image in the `metadata` db.
+image in the `metadata.observations` db.
 
-Endpoint: https://us-central1-panoptes-survey.cloudfunctions.net/update-state
+Endpoint: https://us-central1-panoptes-survey.cloudfunctions.net/update-observation-state
 
 Can be passed either a `sequence_id` or an `image_id`.
 
 Payload: JSON message of the form:
 
-    ```json
-    {
-        'state': str,
-        'sequence_id': str,
-        'image_id': str
-    }
-    ```
+```json
+{
+    'state': str,
+    'sequence_id': str,
+    'image_id': str
+}
+```
 
 Deploy
 ------
 
 [Google Documentation](https://cloud.google.com/functions/docs/deploying/filesystem)
 
 From the directory containing the cloud function. The `entry_point` is the
-name of the function in `main.py` that we want called and `header-to-db`
+name of the function in `main.py` that we want called and `update-observation-state`
 is the name of the Cloud Function we want to create.
 
 ```bash
 gcloud functions deploy \
-                 update-state \
-                 --entry-point update-state \
+                 update-observation-state \
+                 --entry-point update_state \
                  --runtime python37 \
                  --trigger-http
 ```

diff --git a/cf-update-observation-state/deploy.sh b/cf-update-observation-state/deploy.sh
@@ -0,0 +1,9 @@
+#!/bin/bash -e
+
+echo "Deploying cloud function: cf-update-observation-state"
+
+gcloud functions deploy \
+                 update-observation-state \
+                 --entry-point update_state \
+                 --runtime python37 \
+                 --trigger-http
diff --git a/cf-update-state/main.py → cf-update-observation-state/main.py b/cf-update-state/main.py → cf-update-observation-state/main.py
@@ -5,16 +5,13 @@
 from psycopg2 import OperationalError
 from psycopg2.pool import SimpleConnectionPool
 
-PROJECT_ID = os.getenv('POSTGRES_USER', 'panoptes-survey')
-BUCKET_NAME = os.getenv('BUCKET_NAME', 'panoptes-survey')
-
 CONNECTION_NAME = os.getenv(
     'INSTANCE_CONNECTION_NAME',
-    'panoptes-survey:us-central1:panoptes-meta'
+    'panoptes-exp:us-central1:panoptes-metadata'
 )
-DB_USER = os.getenv('POSTGRES_USER', 'panoptes')
-DB_PASSWORD = os.getenv('POSTGRES_PASSWORD', None)
-DB_NAME = os.getenv('POSTGRES_DATABASE', 'metadata')
+DB_USER = os.getenv('DB_USER', 'panoptes')
+DB_PASSWORD = os.getenv('DB_PASSWORD', None)
+DB_NAME = os.getenv('DB_NAME', 'observations')
 
 pg_config = {
     'user': DB_USER,

diff --git a/cf-update-state/requirements.txt → cf-update-observation-state/requirements.txt b/cf-update-state/requirements.txt → cf-update-observation-state/requirements.txt
diff --git a/cf-update-state/deploy.sh b/cf-update-state/deploy.sh
diff --git a/df-make-observation-psc/README.md b/df-make-observation-psc/README.md
@@ -0,0 +1,15 @@
+# Make Observation PSC
+
+Create a dataflow [job template](https://cloud.google.com/dataflow/docs/guides/templates/overview) that, given a sequence_id, will gather CSV files from the panoptes-detected-sources bucket and consolidate them into one large master PSC collection.
+
+This file is then uploaded to the `panoptes-observation-psc` bucket, which will trigger a pubsub message (see the similar source finder [readme](https://github.com/panoptes/panoptes-network/tree/master/gce-find-similar-sources) for details). 
+
+The `deploy.sh` script will make a new version of the template stored in a storage
+bucket and should be run any time the `makepsc.py` file changes.
+
+> :bulb: Note: The `deploy.sh` script requires a python2.7 environment and the `apache-beam[gcp]` module.
+
+The `run_dataflow.sh` script will run the job template with the DataFlow runner
+(i.e. in the cloud) and requires a `sequence_id` parameter to run.
+
+`run_locally.sh` will attempt to run the job locally.
diff --git a/df-make-observation-psc/deploy.sh b/df-make-observation-psc/deploy.sh
@@ -0,0 +1,13 @@
+#!/bin/bash -e
+
+PROJECT_ID='panoptes-survey'
+BUCKET_NAME='panoptes-dataflow'
+JOB_NAME='makepsc'
+
+echo "Sending DataFlow template to template bucket."
+
+python ${JOB_NAME}.py --runner DataflowRunner \
+	--project ${PROJECT_ID} \
+	--stage_location gs://${BUCKET_NAME}/${JOB_NAME}/staging \
+	--temp_location gs://${BUCKET_NAME}/${JOB_NAME}/temp \
+	--template_location gs://${BUCKET_NAME}/${JOB_NAME}/${JOB_NAME}