Skip to content

Uploading data using the API

Cory Lown edited this page Aug 29, 2024 · 21 revisions

While data providers can use the POD Aggregator's web interface to upload data, you will probably want to upload your data regularly using the API.

Streams

Streams are used to group full dumps and subsequent incremental updates and deletes together. Each organization can have many associated streams, but only one default stream. The default stream indicates the current active dump for a given organization. It is used in other parts of the Aggregator (e.g. harvesting and statistics). You can create new streams from your organization page; only the stream name is required. Information on how streams are interpreted can be found under Streams and their files.

Testing Sending Data to POD

If you want to test sending data to POD without impacting your default stream you can do so by creating a new named stream that is not the default. It may be useful to name it something that identifies it as a test stream, such as "Test Stream 2022-04-05." You can then send test data to this stream without impacting your default stream. When sending data via the API you will need to specify the name of the stream using the optional stream parameter (documented below), otherwise the data will be added to the default stream.

Uploading a new "full dump"

API documentation

Upload a file or files to the default stream, or a given stream

  • URL: https://pod.stanford.edu/organizations/$ORG_CODE/uploads (replace $ORG_CODE with your organization code)
  • Method: POST
Parameter name Required? Description
upload[files][] ✅ (either this or upload[url]) Attach one or more files; see the data documentation for more information about accepted file types. Each uploaded file SHOULD include a Content-Type (one of: application/marc, application/marcxml+xml, or text/plain (for plain text deletes)).
upload[url] ✅ (either this or upload[files][]) "Upload" from an external URL; POD will attempt to download the file from this URL. This parameter should only include the URL to the file. POD will determine the content-type from the file extension (.mrc for MARC binary, .xml for MARC XML, .del.txt, .del, or .delete for plain text deletes).
stream (optional) a locally-defined value used to group multiple uploads together. If omitted, the uploads are placed in the current default stream

Mark a named stream as default stream

  • URL: https://pod.stanford.edu/organizations/$ORG_CODE/streams/make_default?stream=$STREAM_ID (replace $ORG_CODE with your organization code and $STREAM_ID the named stream identifier)
  • Method: POST
Parameter name Required? Description
stream a locally-defined value (see above); by making the stream the "default", it will be exposed for harvesting

Examples

Curl

# Set up some environment information (and, in reality, consider protecting your ACCESS_TOKEN by some means)
$ export ORG_CODE="..." # put your organization code (e.g. 'stanford')
$ export ACCESS_TOKEN="..." # put your access token here
$ export STREAM_ID=$(date +%F) #  or some other identifier for the stream

# Upload the e.g. MARC21 files in the current directory in a single upload
$ ls -d * | xargs -I{} printf "\-F 'upload[files][]=@%s;type=application/marc' " {} | xargs curl -H "Authorization: Bearer $ACCESS_TOKEN" -H "Accept: application/json" --url https://pod.stanford.edu/organizations/$ORG_CODE/uploads?stream=$STREAM_ID
# use application/marcxml+xml for MARCXML, and text/plain for newline-delimited deletes.

# OR, upload multiple files in parallel
$ ls -d * | xargs -I{} curl -F 'upload[files][]=@{};type=application/marc' -H 'Authorization: Bearer $ACCESS_TOKEN' -H "Accept: application/json" --url https://pod.stanford.edu/organizations/$ORG_CODE/uploads?stream=$STREAM_ID
# Make the full dump available for harvesting
$ curl -X POST -H "Authorization: Bearer $ACCESS_TOKEN"  -H "Accept: application/json" --url https://pod.stanford.edu/organizations/$ORG_CODE/streams/make_default?stream=$STREAM_ID

Ruby

require 'net/http'

ORG_CODE = '...' # put your organization code (e.g. 'stanford')
STREAM_ID = '...' # identifier for the stream
ACCESS_TOKEN = '...'# put your access token here

uri = URI("https://pod.stanford.edu/organizations/#{ORG_CODE}/uploads?stream=#{STREAM_ID}")
request = Net::HTTP::Post.new(uri)
request['Authorization'] = "Bearer #{ACCESS_TOKEN}"
request['Accept'] = 'application/json'
form_data = [['upload[files][]', File.open('/path/to/file.xml')]]
request.set_form form_data, 'multipart/form-data'
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
   http.request(request)
end

Python

require httpx

ORG_CODE = '...' # put your organization code (e.g. 'stanford')
STREAM_ID = '...' # identifier for the stream
ACCESS_TOKEN = '...'# put your access token here

uri = f'https://pod.stanford.edu/organizations/{ORG_CODE}/uploads?stream={STREAM_ID}'
headers = {'Authorization': f'Bearer {ACCESS_TOKEN}', 'Accept': 'application/json'}
files = { 'upload[files][]': open('/path/to/file.xml', 'rb') }
response = httpx.post(uri, files=files, headers=headers)

Note about Accept headers

If you do not set an Accept header POD will assume you want an html response and if the upload is successful POD will return a 302 redirect to the upload page. When uploading files via a script you can set the Accept header to application/json and if the upload is successful POD will return a 201 response and some JSON with information about the upload.