Skip to content

Nuxeo Load notes for UCB

Barbara Hui edited this page Mar 25, 2015 · 14 revisions

Nuxeo Load notes for UCB

Loading a collection into Nuxeo basically consists of 2 steps:

  1. load files
  2. load metadata

Both make use of the pynux library, which is a python wrapper for the Nuxeo REST API.

This all assumes you're working on nuxeo-stg.cdlib.org, which has Nuxeo and pynux installed, and that you have sufficient permissions.

Load Files

Simple

At its most basic, this entails using pynux's pifolder command to load a folder of content. For example:

pifolder --leaf_type SampleCustomPicture \
  --input_path /apps/content/new_path/UCM/MercedMugbook \
  --target_path /asset-library/UCM/ \
  --folderish_type SampleCustomPicture

This assumes that the /asset-library/UCM/MercedMugbook folder does not exist on Nuxeo yet. If the folder does already exist, use the --skip_root_folder_creation option like so:

pifolder --leaf_type SampleCustomPicture \
  --input_path /apps/content/new_path/UCM/MercedMugbook \
  --target_path /asset-library/UCM/ \
  --folderish_type SampleCustomPicture \
  --skip_root_folder_creation

Prep Files for Loading

In practice, it is often necessary to do a bit of prep work on the "raw" files that we receive from contributors in order to get them ready for the above ingest process, for example removing extraneous/duplicate files, normalizing filenames, etc. Not sure if you'll be doing this as well?

In any case, we hard link the files into a new, organized directory structure rather than copying them, in order to save space. We've named the scripts that do this *relink.py, for example: uci-oral-histories-relink.py

In general, the "raw" files are in /apps/content/raw_path and the organized/hardlinked files are in /apps/content_new_path.

Complex

The above instructions assume that the folder of content you're dealing with is all simple objects, i.e. a directory of image files with no nested components. Complex objects can get pretty hairy to load programmatically, depending. Let's assume you don't have to deal with that for now -- but let me know if you do...

Load Metadata

This basically entails using the pynux-utils update_nuxeo_properties function to "update" the metadata on an existing object in Nuxeo.

Take the halberstadt.py script, for example. This does the following:

  1. iterates over a directory of mets XML files, each of which contains the metadata for an object in this collection
  2. parses the metadata, and transforms it into a ucldc-friendly python dictionary
  3. determines the nuxeo path of the object in question
  4. passes the python dict and the path you obtained in the steps above to update_nuxeo_properties

Parsing and transforming the metadata is obviously the most labor-intensive part of the process, and my scripts are not exactly optimized for reuse, but hopefully they should give you some idea of how to go about things.

Troubleshooting

The errors that you get back from the Nuxeo API when loading are often not very helpful. I would usually get a generic HTTP 500 error. This almost always indicates that your metadata isn't formatted correctly. I'd usually correct it by a process of elimination, but you could certainly make smarter use of the ucldc-schema to programmatically transform the metadata, rather than the kind of janky way I've been doing it!

Clone this wiki locally