Skip to content

Nuxeo Load notes for UCB

Barbara Hui edited this page Mar 25, 2015 · 14 revisions

Nuxeo Load notes for UCB

Loading a collection into Nuxeo basically consists of 2 steps:

  1. load files
  2. load metadata

Both make use of the pynux library, which is a python wrapper for the Nuxeo REST API.

This all assumes you're working on nuxeo-stg.cdlib.org, which has Nuxeo and pynux installed, and that you have sufficient permissions.

Load Files

Simple

At its most basic, this entails using pynux's pifolder command to load a folder of content. For example:

pifolder --leaf_type SampleCustomPicture \
  --input_path /apps/content/new_path/UCM/MercedMugbook \
  --target_path /asset-library/UCM/ \
  --folderish_type SampleCustomPicture

This assumes that the /asset-library/UCM/MercedMugbook folder does not exist on Nuxeo yet. If the folder does already exist, use the --skip_root_folder_creation option like so:

pifolder --leaf_type SampleCustomPicture \
  --input_path /apps/content/new_path/UCM/MercedMugbook \
  --target_path /asset-library/UCM/ \
  --folderish_type SampleCustomPicture
  --skip_root_folder_creation

Prep Files for Loading

In practice, it is often necessary to do a bit of prep work on the "raw" files that we receive from contributors in order to get them ready for the above ingest process, for example removing extraneous/duplicate files, normalizing filenames, etc. Not sure if you'll be doing this as well?

In any case, we hard link the files into a new, organized directory structure rather than copying them, in order to save space. We've named the scripts that do this *relink.py, for example: uci-oral-histories-relink.py

In general, the "raw" files are in /apps/content/raw_path and the organized/hardlinked files are in /apps/content_new_path.

Complex

The above instructions assume that the folder of content you're dealing with is all simple objects, i.e. a directory of image files with no nested components. Complex objects can get pretty hairy to load programmatically, depending. Let's assume you don't have to deal with that for now -- but let me know if you do...

Load Metadata

This basically entails using the pynux-utils update_nuxeo_properties function to "update" the metadata on an existing object in Nuxeo.

Take the halberstadt.py script, for example. This does the following:

  1. iterates over a directory of XML files, each of which contains the metadata for an object in this collection
  2. parses the metadata, and transforms it into a ucldc-friendly python dictionary
  3. determines the nuxeo path of the object in question
  4. passes the python dict and the path to update_nuxeo_properties
Clone this wiki locally