Project-specific scripts in sub-directories.
Below we describe the analysis components implemented in each processing script. Feel free to pick-and-choose features described below when writing new scripts for your own project.
Some issues that all or most of these scripts address:
- extracting classification marks/answers from within the JSON fields of the CSV classification data exports
- cleaning the classification export files:
- removing duplicate classifications (if they occur)
- dealing with empty classifications (some projects throw them out, others count them as "nothing here" votes)
- only including classifications from the most up-to-date workflow version(s)
NOTE: For R code that addresses these issues, please see https://www.github.com/aliburchard/DataProcessing.
Marking star cluster locations in Hubble Space Telescope images.
Script -- Creates CSV of circular marker info from simple marking workflow.
Marker type -- circle
An exoplanet-finding project run as part of Stargazing Live.
Scripts -- Aggregate simple question task (with weighting). Save outputs to Google Drive folder for easy data sharing. This script is adapted from the Pulsar Hunters aggregation script described below; it may be more generally applicable because it doesn't need a bunch of additional files with gold-standard data etc. Update: There is now also an aggregation script that's meant to run on a big classification export (the example is 16GB) without requiring a lot of memory, for those whose computers aren't at the bleeding edge in terms of their RAM.
Marker Type -- question task
A beta project to examine HI structures in the Milky Way.
Scripts -- Extracts markings from classification file into individual files (ready for clustering).
Marker type -- line, point, ellipse, text input attached to mark
A survey project run by Cleveland Metroparks.
Scripts -- Adapts the survey aggregation script initially developed and tested for Wildwatch Kenya (described below)
Marker type -- Survey
Answering questions about the presence of bar structures and marking bar dimensions.
Scripts -- Analyzes joint question+marking workflow (but mostly the markings).
Marker type -- line
Extracting markings of damage and other features from post-disaster satellite imagery.
Script -- puts classification information together with geocoordinate information from subject exports.
Marker type -- point, polygon (though these aren't reduced here)
Marking interesting objects (including moving objects) in images from the WISE satellite.
Script -- Creates CSV of point marker info from simple marking workflow.
Marker type -- point
Classification of radio observations to identify pulsar candidates.
Scripts -- Analyzes responses and aggregates object type answer, also script for counting classifications. IP address tracking was wonky during this project, so unique non-logged-in users were identified with browser session info instead.
Marker type -- no markers, only 1 question task
Workflow #1: Yes/No if sea lions are present.
Scripts -- 1) Extracts normal csv from embedded JSON. 2) Aggregates results.
Marker type -- no marks, only question tasks
A survey of species from camera trap data in Kenya.
Scripts -- Jailbreak survey annotations into a format more easily digestible by external scripts (1 line per species ID or "nothing here" classification), aggregate jailbroken annotations into a flattened CSV file with one line per subject. Also uses general utility scripts.
Marker type -- Survey
Classifying galaxies according to shape on a touch table device.
Scripts -- In order to prepare a device's local database, this script will read a Panoptes subject export csv and produce an appropriately parsed csv file that is ready for import as a database (.db) file.
Includes scripts that generate progress reports for Ouroboros-based GZ project, and decision tree processing
Scripts that compute statistics and analyzes Talk data for Ouroboros-based GZ project.
Fairly general scripts to process Galaxy Zoo classification database dumps into vote fractions for each subject and match with subject metadata. Note that this does not (yet) include debiasing.