-
Notifications
You must be signed in to change notification settings - Fork 3
Uploading data using the API
While data providers can use the POD Aggregator's web interface to upload data, you will probably want to upload your data regularly using the API.
Streams are used to group full dumps and subsequent incremental updates and deletes together. Each organization can have many associated streams, but only one default stream. The default stream indicates the current active dump for a given organization. It is used in other parts of the Aggregator (e.g. harvesting and statistics). You can create new streams from your organization page; only the stream name is required. Information on how streams are interpreted can be found under Streams and their files.
If you want to test sending data to POD without impacting your default stream you can do so by creating a new named stream that is not the default. It may be useful to name it something that identifies it as a test stream, such as "Test Stream 2022-04-05." You can then send test data to this stream without impacting your default stream. When sending data via the API you will need to specify the name of the stream using the optional stream
parameter (documented below), otherwise the data will be added to the default stream.
- URL:
https://pod.stanford.edu/organizations/$ORG_CODE/uploads
(replace $ORG_CODE with your organization code) - Method:
POST
Parameter name | Required? | Description |
---|---|---|
upload[files][] |
✅ (either this or upload[url] ) |
Attach one or more files; see the data documentation for more information about accepted file types. Each uploaded file SHOULD include a Content-Type (one of: application/marc , application/marcxml+xml , or text/plain (for plain text deletes)). |
upload[url] |
✅ (either this or upload[files][] ) |
"Upload" from an external URL; POD will attempt to download the file from this URL. This parameter should only include the URL to the file. POD will determine the content-type from the file extension (.mrc for MARC binary, .xml for MARC XML, .del.txt , .del , or .delete for plain text deletes). |
stream |
(optional) | a locally-defined value used to group multiple uploads together. If omitted, the uploads are placed in the current default stream |
- URL:
https://pod.stanford.edu/organizations/$ORG_CODE/streams/make_default?stream=$STREAM_ID
(replace $ORG_CODE with your organization code and $STREAM_ID the named stream identifier) - Method:
POST
Parameter name | Required? | Description |
---|---|---|
stream |
✅ | a locally-defined value (see above); by making the stream the "default", it will be exposed for harvesting |
# Set up some environment information (and, in reality, consider protecting your ACCESS_TOKEN by some means)
$ export ORG_CODE="..." # put your organization code (e.g. 'stanford')
$ export ACCESS_TOKEN="..." # put your access token here
$ export STREAM_ID=$(date +%F) # or some other identifier for the stream
# Upload the e.g. MARC21 files in the current directory in a single upload
$ ls -d * | xargs -I{} printf "\-F 'upload[files][]=@%s;type=application/marc' " {} | xargs curl -H "Authorization: Bearer $ACCESS_TOKEN" -H "Accept: application/json" --url https://pod.stanford.edu/organizations/$ORG_CODE/uploads?stream=$STREAM_ID
# use application/marcxml+xml for MARCXML, and text/plain for newline-delimited deletes.
# OR, upload multiple files in parallel
$ ls -d * | xargs -I{} curl -F 'upload[files][]=@{};type=application/marc' -H 'Authorization: Bearer $ACCESS_TOKEN' -H "Accept: application/json" --url https://pod.stanford.edu/organizations/$ORG_CODE/uploads?stream=$STREAM_ID
# Make the full dump available for harvesting
$ curl -X POST -H "Authorization: Bearer $ACCESS_TOKEN" -H "Accept: application/json" --url https://pod.stanford.edu/organizations/$ORG_CODE/streams/make_default?stream=$STREAM_ID
require 'net/http'
ORG_CODE = '...' # put your organization code (e.g. 'stanford')
STREAM_ID = '...' # identifier for the stream
ACCESS_TOKEN = '...'# put your access token here
uri = URI("https://pod.stanford.edu/organizations/#{ORG_CODE}/uploads?stream=#{STREAM_ID}")
request = Net::HTTP::Post.new(uri)
request['Authorization'] = "Bearer #{ACCESS_TOKEN}"
request['Accept'] = 'application/json'
form_data = [['upload[files][]', File.open('/path/to/file.xml')]]
request.set_form form_data, 'multipart/form-data'
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
http.request(request)
end
require httpx
ORG_CODE = '...' # put your organization code (e.g. 'stanford')
STREAM_ID = '...' # identifier for the stream
ACCESS_TOKEN = '...'# put your access token here
uri = f'https://pod.stanford.edu/organizations/{ORG_CODE}/uploads?stream={STREAM_ID}'
headers = {'Authorization': f'Bearer {ACCESS_TOKEN}', 'Accept': 'application/json'}
files = { 'upload[files][]': open('/path/to/file.xml', 'rb') }
response = httpx.post(uri, files=files, headers=headers)
If you do not set an Accept
header POD will assume you want an html response and if the upload is successful POD will return a 302
redirect to the upload page. When uploading files via a script you can set the Accept
header to application/json
and if the upload is successful POD will return a 201
response and some JSON with information about the upload.