Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFlow - Consolidate Observation Frames #29

Open
wants to merge 45 commits into
base: develop
Choose a base branch
from

Conversation

wtgee
Copy link
Member

@wtgee wtgee commented Mar 20, 2019

A dataflow job template that, given a sequence_id, will gather CSV files from the panoptes-detected-sources bucket and consolidate them into one large master PSC collection. This file is then uploaded to panoptes-observation-psc, which will trigger a pubsub message (see the similar source finder readme for details).

This job could feasibly either be replaced by simplified cloud function and storage compose utility or be expanded to include the source filtering currently happening as part of the similar source finder.

wtgee added 30 commits March 17, 2019 08:34
This creates a template for a dataflow job. The job accepts a location
of an observation sequence as stored in the `panoptes-detected-sources`
bucket. For now this is a full bucket path but soon will be just the
`sequence_id`.  Currently adds all the CSV files together, then filters
the sources that don't appear in all the frames. Also computes the
normalized data so the similar stars finder can be added.

Todo:
* Accept just a `sequence_id`.
* Figure out best way to call.
* Add in the `frameThreshold` param.
* Do Similar Stars step
also calculate the Similar Stars. The latter takes a long time to run
and could probably be optimized. Also currently giving bad scores. :)
* Better check for object_id from event bucket.
wtgee added a commit that referenced this pull request May 13, 2019
* Adding RGB endpoint for CR2 files
	* TODO: Convert FITS; timelapse; jpgs
* Removing old dataflow items
* Adding build scripts for containers and kuberenets clusters for the
	similar source finder.
* Removing old Cloud SQL connections
* Misc cleanup
@wtgee wtgee changed the base branch from master to develop May 15, 2019 06:25
wtgee added a commit that referenced this pull request May 16, 2019
* Based off `panoptes-utils` docker image
* Upload master observation to bucket. Previously this was being done
by #29.
* Misc cleanup
@wtgee wtgee mentioned this pull request May 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant