-
Notifications
You must be signed in to change notification settings - Fork 1
Collection Content User Stories
Narrative: The DVS Department collects and maintains data collections on behalf of the University. These collections are occasionally accessible to the world without authentication. Often these collections are accessible only to Duke University and affiliates. For duke-only data, distribution restrictions and embedded licensing information are often contingent requirements for access by the end-user. The data are sometimes zipped components which must be distributed together; sometimes individual data files are found in various formats including but not limited to CSV, TSV, eXcel, Stata, R, SAS, Tableau, Shapefiles, KML, Database files, TXT.
Story: As a DVS Staff member, I want to...
-
bulk upload data and metadata stored in a various data formats (see above)
-
bulk upload data and metadata stored in JSON
-
edit permissions on collections, items and files (components)
-
edit metadata on collections, items, and files
-
search for collections, items, and files
-
limit searching by tags
-
attached tags to collections, items, and files
-
limit searches by DVS collections, items, and files
-
publish and unpublish collections, items, and files individually
-
publish and unpublish collections in bulk
-
embed links to collections, items and files inside of other web publishing interfaces (e.g. blogs, and Drupal CMS pages)
-
leverage a professional looking end-user search interface to my DVS collections, items and files; including embedding search tools and results pages into other web publishing interfaces. (e.g. replace our existing collections pages.)
-
delete collections, items, and files for which we no longer have subscription or retention rights
-
mask (hide from view based on permissions) collections, items, and files as necessary
-
store and represent different versions of files which can change as a result of corrections or additional data
-
have library data & metadata processors (i.e. modern data catalogers and catalog assistants) process our collections for use by end-users (this level of support is analogous to how metadata [catalog records] & data [books, etc.] are processed for the end user in “the catalog” -- o.k. technically books [and etc., i.e. "data"] are not stored in "the catalog." But then again, if we were to use an enlightened cataloging tool -- something like, say, Kuali OLE --then there would be no reason why we couldn't store and manage metadata and data in the same data-store, i.e. the catalog. And by extension, wouldn't it be nice if we could do that in the repository as well. Maybe we could have a workflow engine that manages data flow between various systems, exchanging data with open standards. But, wait, this is all sounding very SF and getting a bit beyond this particular user story. The real point of this bullet is having library support services for processing and ingesting our collections would be muy bueno.)
Story: As an end user, I want to…
- access a search page that returns faceted results of the collections, items, and files
- facet will limit by subject, tags, keywords, and format (for example this page)
- browse the collections, items, and files
- be prompted for Duke-Single Sign On (i.e. NetID) permissions at the point of need
- download collections, items, and files for which I have permissions
- browse and search the metadata and descriptions for items for which I do not have download permissions (and be properly notified of my privileges before attempting to download files.)
- Review versions of files which may have changes and been uploaded by the DVS Staff
##DPPS
##Preservation
I. Vendor-Created TIFFs of published material
Narrative: The Preservation Officer receives twenty DVDs created by a commercial vendor for a project that digitized special collections material to create a commercial (i.e., artificial) collection. Each disc contains multiple folders, each folder represents one bibliographic object or one volume of a bibliographic object (called a Work, here). The DVDs are arranged somewhat similarly; however, the contents of the files may vary:
- Some folders contain only the TIFF master files for a page (or a two-page opening?) of a Work; an additional file that may give page order (file order) for its volume.
- Some folders contain not only a Work's TIFF masters and page-order file as above, but also a PDF derivative for the Work. In those instances, a single PDF represents an entire volume or bibliographic object.
- An additional set of discs contains PDF derivatives. These PDFs may be for Works whose derivatives are not on discs with the TIFF masters.
- A disc containing the XML and HTML database structure, plus additional text, enabling us to recreate the searchable digital project locally.
Story: As the Preservation Officer,
I want to ingest the contents into the digital repository;
I want to be able to reconstitute the original works from the individual page files;
I want to associate the PDF derivatives with the digital object comprising their TIFF master files;
I want to associate the appropriate metadata from (1) the physical item's online record, (2) vendor-created metadata, and (3) our own metadata with the digital works.
I want to be able to search using (or rather, I want one of my successors to be able to search using) the front end and database files provided.
So that we can manage the content over time.
II. Publisher's abandoned content
Narrative: The Preservation Officer receives a notice from a publisher that it will cease publishing an online resource and that we should harvest the content if we want to continue making it available to our users. Content will not be available from LOCKSS, CLOCKSS, or Portico. Files may represent complete bibliographic units (e.g., monographs), a serial volume, a single issue, an article, or a page.
Story: As the Preservation Officer,
I want to ingest a set of files into the digital repository to serve as the preservation masters;
I want to be able to group the constituent files into the object representing the next highest level (i.e., pages into articles or chapters; articles into issues, or chapters into volumes; etc.)
I want to associate the appropriate metadata with the master files in a way that unambiguously reflects this hierarchy and matches it to its original intellectual structure;
I want to create access files and make them available for our library users;
So that we can preserve the content and make it accessible in a way that is familiar to our users, and
So that our users can cite articles or locate cited articles per the intellectual organization established by the publisher.
III. Publisher Pushes Archiving Responsibility to Libraries
Narrative: The Preservation Officer receives an email stating that X publisher will allow us to harvest digital files of reference material to which the Libraries has purchased access. These will be our archival files; the publisher makes no other pledge to provide preservation for the content. Going to the designated website reveals that the titles are downloadable as a single folder containing each title as zipped file. The Preservation Officer downloads the zipped file twice to extract files and examine contents. All appear to be XML files.
Story: As the Preservation Officer,
I want to consult our Procedures manual regarding archiving/compression formats to process them according to established protocol;
I want to ingest the files as preservation masters into the digital repository;
I want to group individual files hierarchically to reconstitute the original item;
I want to associate the appropriate metadata with each file, object, and title in a manner that enables discovery using the publisher's intended intellectual structure;
So that we will be able to provide access should the original resources become unavailable.
##Rubenstein Library (I have more detailed lists of filetypes present for each of these stories. In the interest of keeping these short and sweet, I have not copied them here)
I. Collections of Mixed Materials with Varying Levels of Complexity
Narrative: Before being made available to researchers, archival collections are arranged and described. Depending on the collection, arrangements can range from flat to highly structured, with some collections’ complexity varying from component to component. For digital materials in archival collections archivists would like to be able to group digital objects in the repository in ways that could support nesting and/or more complex structures.
Story:
As an archivist,
I want to ingest digital materials and associated metadata into the repository.
I want to group individual computer files into compound items where appropriate.
I want to treat individual computer files as distinct items where appropriate.
I want to be able to group items into sets at appropriate levels, depending on the collection.
II. Collections Containing Disk Images
Narrative: Disk images are created when acquiring almost all digital material from legacy media. In some cases, only files extracted from disk images will be preserved in the repository. In other cases, files will not be extracted from the disk image, making the disk image the object of preservation and access. In still other cases, the disk image will be preserved alongside any extracted files (which may be alternatively arranged, normalized, or otherwise processed).
Story:
As an archivist,
I want to ingest disk images and associated metadata into the repository.
I want to provide access to disk images to users with appropriate permissions.
I want to make explicit the relationships between disk image and any files extracted from the disk image for access or use.
III. ETDs - ProQuest/Grad School (current DDR model can support this)
Narrative: Each semester, the University Archives receive a batch of electronic theses and dissertations (ETDs) that have gone through ProQuest/UMI’s publishing/cataloging system. The digital archivist transforms the metadata sent by ProQuest and pushes the content files and transformed metadata through to the repository. The ProQuest package consists of a PDF file, and additional files, and metadata in a ProQuest-specific format. The transformed DSpace package consists of a PDF, any additional files, DublinCore XML, and potentially DukeCore embargo metadata.
Story:
As digital archivist,
I want to transform the ProQuest package, if necessary, to a package understandable by the repository.
I want to submit the packages in batch once per semester.
IV. ETDs - Self Submission (current DDR model can support this)
Narrative: Each semester, graduating students in several professional schools, as well as undergraduates receiving degrees with distinction, submit their theses to the repository. Students enter metadata individually after being granted access by the digital archivist.
Story:
As a student,
I want to submit my thesis to the repository, entering any descriptive metadata by hand.
I want to submit any supporting files along with my thesis.
##Digital Production Center
Technical Metadata
Narrative: The Digital Production Center’s main focus is to produce digital collections of Library holdings. During the process of digitizing a collection a digitization guide is produced. The digitization guide contains technical information related to the production of each item in a collection, which includes but is not limited to the workstation, date of capture, scanner operator, scanner type, quality control date and quality control operator associated with a component.
Story : As a DPC member I would like:
- Upload the contents of a digitization guide and associate the technical metadata with the corresponding digital components.
Parts (DPP)
Narrative: The Digital Production Center creates reproductions of over-sized material that are captured in multiple shots. These shots are stitched together to recreate the original. Both the resulting image and its parts need to be saved in the repository. Reproductions of over-sized items can have from 2-6 parts. These files are stored in a separate ‘Parts’ within the collection folder on the dark storage server (similar to the targets folder). File names for Parts contain identical root file names appended with a location:
a. top/bottom b. top left, top right, bottom left, bottom right c. top left, top right, center left, center right, bottom left, bottom right d. top, center, bottom
Story: As a DPC member I would like:
- I want to ingest digital collections that include over-sized items
- I want to associate the composite image to its parts
- I want the DDR to report the number of parts separate from the number of original components/items under the Collection Info tab
Patron Requests
Narrative: The DPC produces digital images of Rubenstein Library holdings as requested by patrons. These requests come from a variety of analogue formats (books, newspapers, photographs, audio tapes, video). The content of these requests range from a full volume, multiple full volumes, an image from a page in a volume, one or multiple pages in a volume but not the entire volume. An individual request can also span multiple items. Currently, these files are stored on the Image Transfer server in folders organized by the last name of the patron. The file names can have a variety of naming schemes but often use the last name of the patron followed by a sequential number.
Story: As a DPC member I would like to:
- I want to ingest patron requests into the repository
- I want to have the patron retrieve the requested images from the repository.
- I want to have metadata associated with these images so that it can be discovered in the repository where appropriate.
Disc Drive full of images (HFG, KPC)
Narrative: The DPC receives images produced by an entity outside of the department that are delivered on a hard drive. These items are digitized versions of library material or a purchased collection of digitized material. There are times the structure of the data is unknown and other times the structure can be discerned but is inconsistent with data models currently supported by the DDR.There are times an item is the content of a folder whose file names may or may not follow a logical pattern. Other times the naming convention follows logical pattern within a folder but not across folders. There are times when the files received by the DPC follow archival imaging standards and other times not. It is difficult to know how many items are on a disk drive to compare with how many items the repository reports it will ingest.
Story: As a member of the DPC I would like to:
- I want to ingest the contents of a disk drive into the repository
- I want to use the existing folder structure to inform the repository of what an item is for a particular collection.
- I want to indicate whether an ingested set of files meets archival standards.
- I want to restructure collections already ingested in the repository to adjust for anything that could not be conveyed in the file name or folder structure.
- I want to make the HD content available on the Library website to users with appropriate permissions
- Mask (hide from view based on permissions) collections, items, and files as necessary