Datagator is a lightweight flexible framework to ingest, store, and analyze Data about Data
A collection of scripts to extract, transform, and publish data from a number of websites.
TODO:
Configure by using datagator-config-template
, copy it to ~/.datagator
and fill-in the blanks. Alternatively, set env vars (see datagator-config-template
file for details).
Parse the DVD Queue page, which you might find useful prior to Netflix DVD ceasing operations on Sept. 28, 2023
- The Little Prince Collection (https://www.petit-prince-collection.com)
dg get PP-7146
Scrapes Little Prince Collection data from https://www.petit-prince-collection.com/lang/show_livre.php?lang=en&id=7146
Output:
$DGOUT/PP/$PPID/items.json
JSON array of Datagator-format records including metadata$DGOUT/PP/$PPID/raw-covers
Downloaded resources (images)
dg build
Do some validation on local data and build the publishable artifacts.
dg publish
Upload (reconcile) all local data to the cloud deployment.