#WIP
This is a Singer tap that produces JSON-formatted data following the Singer spec.
This tap:
- Extracts data from supported file-like storage systems and in supported formats.
- Supports date modified based incremental replication from some of the supported storage systems.
This tap uses the fsspec project to support many storage systems.
System / Service | Incremental Replication Support |
---|---|
local file system | Y |
FTP / SFTP | Y |
S3 | Y |
Google Cloud Storage (GCS) | Y |
Azure Blob Storage / Azure Datalake | Y |
Dropbox | |
HTTP / HTTPS | |
HDFS / WebHDFS | |
git | |
Github |
Storage locations in the path
/paths
parameter in the configuration use the scheme portion if the URL passed to determine the storage class to use. For example "s3://some-bucket/path/to/my/file.csv" where "s3" is the scheme. It defaults to the local filesystem.
The local file system would also support anything mounted to the local file system, such as NFS and SMB network file shares.
Format | Extensions | Notes |
---|---|---|
csv | .csv, .tsv | Defaults to "," delimiter. Defaults to tab delimiter if extension is .tsv |
excel | .xlsx, .xls | |
gis | .shp, .geojson, .ldgeojson | Supports converting spatial projection using the to_crs format option. Defaults to adding a geom field with stringified geojson. |
json | .json, .ldjson | Also supports line-delimited JSON (ldjson) |
TODO
TODO