SEP 054 -- Managing genetic design packages

SEP
Title	Managing genetic design packages
Authors	Jacob Beal (jakebeal@ieee.org), Tom Mitchell, Bryan Bartley, Gonzalo Vidal, James Scott Brown, Helen Scott, Noah Sprent, Vinoo Selvarajah
Editor
Type	Data Model
SBOL Version	SBOL 3.1
Replaces	None
Status	Draft
Created	17-Oct-2021
Last modified	19-Feb-2022
Issue	#112

Abstract

This SEP proposes a set of practices for managing collections of genetic designs (or other information encoded in SBOL). These practices approach the sharing of SBOL information in a similar way to software package managers. These practices are intended to support similar management of shared SBOL information across multiple platforms and repositories.

We are now building collections of durable genetic design information (and similar) in multiple contexts, and some of these genetic designs draw on information in other collections of genetic designs. In these circumstances, we do not want to end up with genetic design information either being repeatedly duplicated (and forked) or becoming inaccessible.

Software development has dealt with similar problems in the past, and converged on various approaches based on notions of libraries or packages, coupled with package managers for automating the collections of required packages and their dependencies, and public repositories for maintaining designs. Examples include pip for Python, npm for Javascript, Maven for Java, and apt for Debian. All of these have generally similar architectures and interfaces, though specifics differ, in part due to the differences between the software environments being managed.

This document proposes a similar approach to package management for collections of information encoded in SBOL. Sharing of genetic design information is the initiating motivation, but the same approach can be applied to other types of information as well, including designs for experiments, sample inventories, and novel representations implemented as SBOL extensions.

1.1 Design Goals

All of the activities on the following non-exhaustive list are intended to be supported cleanly with this proposal:

Import a design so that you don't have to define it locally
Refer to a design from a "sibling" package within a single prject
Refer to a design from some other package in some other project
Be able to refer to things that aren't in a package, but some other external non-package source (e.g. NCBI, UniProt)
Refer to things that have more than one potential source, possibly in different formats (e.g., iGEM parts direct from the repository in FASTA format and from SynBioHub in SBOL2 format)
Refer to non-SBOL object that have more than one potential name (e.g., iGEM "BBa_E0040" vs. "E0040" and NCBI "L29345.1" vs. "L29345")
Make a local cache of a set of designs, so that you don't have to keep downloading them (and can also avoid flaky server issues)
Check whether materials in the cache are obsolete and need to be refreshed from their sources.
Clean out cache materials and rebuild from scratch.
Refer to a specific version of a collection, or to the latest version
Know if a collection of package information is out of date
Import a whole package into a document
Import selected elements of a package into a document
Handle at least materials in git, materials in SynBioHub, and materials at NCBI
Avoid replicating or republishing materials imported from another package
Build composite documents that include imported material from other packages
House multiple packages in a single repository
Rehome a repository and packages that are being migrated to a different location
Run tests against updated packages or repositories without releasing them
Make a "working document" by importing various packages
Import dependencies of an import, as needed

This proposal is primarily intended to be a practice to follow, but does propose extension classes that may eventually be included in the specification.

Specification

2.1 Terminology

This specification and set of practices for SBOL package management uses the following definitions

SBOL document: a set of SBOL TopLevel objects aggregated together such that they can be identified and retrieved by means of a single URI. This can come in multiple forms, including serialized into a file in an RDF format such as sorted N-triples, access via an online API such as a SynBioHub collection, or as a database entry.
Package: a systematic aggregation of SBOL materials intended to be distributed and used as a coherent whole. A package is associated with a specific namespace, and all TopLevel objects in the package must share that namespace. When stored in a hierarchical structure (e.g., files in directories), the relationship between package namespaces MUST be identical to the relationship between package locations in the structure. Note that packages can have sub-packages.
Root Package: a package that is not a sub-package of any other package. Root packages are the objects directly managed by a package catalog, while sub-packages are managed indirectly as part of their containing root package.
Native Package: a package whose contents are all defined in SBOL3.
Conversion Package: a source of non-SBOL3 material with stable URIs for retrieving objects (e.g., NCBI GenBank, UniProt, the iGEM Parts Repository), being used as package source for SBOL parts. Note that this also includes material encoded in prior versions of SBOL.
Generated Content: a set of SBOL TopLevel objects created from other information, either in SBOL (e.g., a generated package description) or external to SBOL (e.g., a converted FASTA file).
Build Artifact: a document derived by assembling a set of SBOL TopLevel objects from various packages without modifying the content of the TopLevel objects therein.
Dependency: Statement that a package refers to TopLevel objects from other packages. Dependencies may be declared at various levels of granularity:
- Object: only an individual TopLevel object (along with all of its child objects)
- Object and references: a TopLevel object and all of the other TopLevel objects that it references in its definition, except for via provenance (PROV-O) relations.
- Package: all of theTopLevel objects in a package, implicitly including all of its dependencies
Direct Dependency: A dependency that is specified in a package.
Indirect Dependency: A dependency that is not direct, but is specified in some other recursive dependency of a package.
Atomic Package: A package that is small enough that its contents can be readily managed and transmitted between machines as a single document.
Dissociated Package: A non-atomic package, i.e., one that that is too large to be used in toto as a dependency on the package level, and is thus always managed on the level of objects or objects and references instead. Every dissociated package SHOULD be a conversion package. For example, the conversion package for NCBI GenBank is a dissociated package, meaning that sequences in NCBI GenBank must be referenced and imported individually.
Cache: a store for local copies of packages (or fragments thereof) obtained from elsewhere. Caches may be shared or local to a package. Atomic packages will generally be stored in a shared cache, while dissociated packages will generally be stored in a local cache.
Package catalog: source for information about available packages and how they may be retrieved, based on their URIs.
Package manager: tool for using one or more package catalogs to retrieve, cache, and resolve dependencies in packages.

2.2 Representation

Dependency

Dependency extends the Identified class with the following fields:

package [ 1 ], URI: Root package that is the dependency or contains the dependency
version [0,1], string: Statement indicating the version of the package to use. Versions SHOULD use SemVer format. If this property is not provided, then any version is considered acceptable, with the highest version preferred. Future work: expand to include constraints as well, e.g., >=1.2, <2
subpackage [0,1], URI: If the dependency is or is contained within a specific sub-package within the package, this property specifies its URI.
object [0,n], URI: If dependency is specific TopLevel objects rather than a complete package, each instance of this property indicates one of those objects.
includeObjectReferences [0,1], boolean: By default, object dependencies are only to the specific object itself. If this value is set to true, then object dependencies also imply dependencies on all of the TopLevel objects recursively referred to by objects in the dependency set. For example, dependency on a plasmid implies depending on the insert it is carrying, which depends in turn on the functional units in the insert, which also depends on the basic parts like promoters, CDSs, terminators, that they contain.

A dependency list MAY contain redundant or overlapping entries. For example, a dependency list may contain both a package and a supackage within that package, both a subpackage and specific objects in the subpackage, or even multiple entries for the same object. In such cases, the effective dependency list is the union of all of the individual dependencies in the list.

This permissive unioning behavior is intended to allow independently defined dependency lists to be safely merged, e.g., in a package that depends on two other packages. Accordingly, a tool SHOULD NOT consider it to be an issue of there are redundancies between Dependency objects contained by different Package objects. A tool MAY, however, choose to warn users about redundancies within the dependencies of an individual Package.

Package

Package extends the Collection class with the following fields:

version [0,1], string: The value of this property SHOULD use SemVer to indicate the version of this package. This property SHOULD be set if and only if a package is a root package.
conversion [ 1 ], boolean: Indicates if this is a native package (false) or conversion package (true)
dissociated [0,1], string: If not set, this is an atomic package. If set, indicates this package is a dissociated package of the type identified by the string (see practices below).
hasDependency [0,n], URI: Dependency child objects
subPackage [0,n], URI: Indicates the set of Package objects aggregated in the Package.

The displayId of a Package SHOULD be package.

To avoid potential confusion between packages with similar names, the identity URI of a package SHOULD be lower-case ASCII, and the path elements of the URI SHOULD be only alphanumeric or dash characters (e.g., https://igem.org/fluorescent-proteins/package).

Every member of a Package MUST have the same value for hasNamespace as the package itself.

A package SHOULD have dissociated set only if conversion is also true.

A package SHOULD only have values for subPackage if it is a native package (i.e., conversion is false).

For packages with conversion equal to true, the version property SHOULD match its major version number to the major version of the data source, where applicable. Other elements of the versioning SHOULD be used to indicate versioning of the converted material.

Note that any change to the sequence of a part SHOULD be a major version change, since there is no guarantee that the part will continue to have the same behavior. There are certain circumstances where it would not be, such as correcting a bug in which a sequence was not listed correctly to begin with. To add a sequence that is expected to have better performance, do not change the sequence: add a new part and deprecate the old one.

Every Package object pointed to by subPackage MUST have a hasNamespace that is equal to the hasNamespace of the Package object plus a local sub-path. For example https://example.org/MyPackage/ might have subPackage values https://example.org/MyPackage/promoters and https://example.org/MyPackage/regulatory/repressors

2.3 Management Practices with Packages

Defining a Package from a Document

Given an SBOL document, a native Package can be computed by inspecting the collection of TopLevel objects in the document. If all TopLevel objects have the same hasNamespace value, then they form a well-defined Package with identity [namespace]/package and members equal to the identities of the TopLevel objects.

Object dependencies for a Package with includeObjectReference=true can be computed by collecting references to TopLevel objects that are not contained within the package and resolving those against a catalog of packages. Other granularities can only be explicitly defined.

Conversion packages cannot be computed and need to be explicitly defined.

Defining a Package as an Aggregate of Documents

A Package can be computed by inspecting any collection of SBOL documents, first converting each SBOL document to a Packages, then aggregating the Package objects into another Package that contains them all.

For example, a Package can be computed from a directory of files. In the case of building a Package from a directory, if all of the document's subPackage objects' namespaces share a prefix, then they form a well-defined Package with identity [prefix]/package and with conversion=false and dissociated not set. In this case, the Package will typically have no member values (since a directory contains documents, rather than being a document itself) and its hasDependency values will be the union of the hasDependency values of its sub-packages.

The name and description of a package cannot be computed. If they are desired, they need to be explicitly provided.

Root packages additionally need to have their version explicitly provided.

Any package catalog SHOULD contain conversion packages for common conversion packages, such as NCBI GenBank and the iGEM Parts Repository (see also defining dissociated packages below).

Defining a Dissociated Package

Dissociated packages are special cases. In order to successfully use materials from a dissociated package, the following information needs to be known:

How to convert a UID for the source into its canonical form (e.g., iGEM "E0040" into "BBa_E0040", NCBI "L29345" into "L29345.1").
How to map a canonical UID into a retrieval URI
What format retrieved information is expected to be in.
What namespace to prepend to items at conversion.

The namespace is set by the Package object for the dissociated package.

The other aspects are hard-coded based on the string value provided for the dissociated property. At present, the following values are defined:

ncbi: Information stored in NCBI
igem: Information stored in the iGEM parts repository and/or the iGEM SBOL2 conversion in SynBioHub at https://synbiohub.org/
synbiohub: SynBioHub instance located at namespace

Behavior with respect to any other value of dissociated is currently undefined, and thus any other value of dissociated SHOULD be treated as an error condition.

Collating a Native Package Build Artifact

The materials of a native Package can be collated into a single SBOL document. This is a build artifact that is useful for distributing materials in order to satisfy dependencies.

This document SHOULD include the Package object, all of its sub-Package objects recursively, and all of their member objects and children.

Which dependencies are included in the build artifact depends on the planned use of the build artifact:

For use in package cataloging and distribution, the build artifact SHOULD NOT include any materials from non-dissociated dependencies (except for dependencies between the sub-packages of the package itself). This is because these packages may be included in multiple dependency paths. Instead, they are assumed to be able to be fetched independently during dependency resolution.

The build artifact SHOULD, however, include all objects included as direct dependencies from dissociated package materials. This is because dissociated packages are too large to cache and SHOULD NOT be in SBOL format, meaning that repeating the fetch and conversion of a dissociated package object is not guaranteed to produce an identical object. Including them in the package guarantees stability of the package materials.

Indirect dissociated package dependencies SHOULD NOT be included, however, because they will be included in the package that has them as a direct dependency.
For use in building an "exported" collection of information to be used outside of the package management system (e.g., for preparation of a synthesis order), all dependencies of all types SHOULD be included, including indirect dependencies.

Building and Using a Package Catalog

A catalog of packages can be built by collecting together a set of root Package objects. To keep the catalog manageable in size, the catalog SHOULD contain only root Package information and SHOULD NOT contain sub-Package objects or the actual contents of packages.

A catalog also needs to be able to track multiple versions of the same package. If the catalog is to be a single document, then this means that the Package objects for the different versions must not share the same URI.

To this end, each Package in the catalog MUST have its version property set. The identity of the Package object in the catalog is then rewritten from [namespace]/[displayId] to [namespace]/[version]/[displayId], with a prov:wasDerivedFrom link connecting it to the original identity.

Every root Package in such a catalog MUST NOT have a namespace that is a prefix of any other Package in the catalog, except for other versions of itself.

Packages can then be looked up in the package catalog by prefix testing on identity URIs. This will identify the set of available package versions, which can then be used for dependency resolution.

Information about where to retrieve the contents of a Package can be added via Attachment objects. The Attachment object points to a specific URI from which a build artifact can be retrieved, that build artifact containing the Package and all of its contents for the identified release. Each Package in the catalog MUST have at least one such Attachment.

Native libraries should use format=http://sbols.org/v3#. TODO: this should use EDAM, but EDAM currently has no entry for SBOL3, and this is required for distinguishing between SBOL3 conversions from prior SBOL formats. For conversion libraries, the library may have a build artifact that has already been converted into SBOL3, in which case it should also use the SBOL3 format. Artifacts in other formats should should use EDAM to indicate the format in order to allow conversion. When there are multiple attachments in the same format, they represent mirror alternatives for retrieval and thus their contents MUST be identical. Identical contents can be efficiently tested for by comparing attachment hash values.

A package may then be retrieved for use by using the information in its Attachment objects in the catalog to download a build artifact containing its contents, which may or may not require conversion after download. In the case of multiple attachments, which attachments are attempted in which order is left for tools to determine heuristically.

Packages downloaded with the aid of a catalog SHOULD be stored in a shared cache location, such that they do not need to be downloaded again if they are also used in another project. Note also that there is no reason that such a cache cannot include multiple versions of a package, in case different versions are being used by different projects.

Note that the Attachment system also helps support migration of package material, as new locations can be added as new Attachment objects, and obsolete locations can be deprecated and then eventually have their Attachment objects deleted.

TODO: It may also be useful to consider a system of nicknames, such as "ncbi" or "igem:promoters" that can be used as short-form aliases for the full package URLs. This is likely best deferred to a future SEP, however.

Publishing Packages to a Shared Package Catalog

A public shared catalog of published packages will be valuable to maintain. The working title for this collection is the "SBOL index of packages", AKA "sip" (a deliberate reference to Python's pip)

Contributors SHOULD NOT submit packages that are arbitrary collections of designs without a clear engineering function. For example, a supplementary information file collecting all the genetic constructs in a scientific publication would not generally make a good sip package. Likewise, forking of packages is discouraged: improvements to designs SHOULD result in submission of a new version of a package rather than a new package.

A public catalog SHOULD be maintained in a version control system such as git or a database or other system providing sufficiently similar capabilities. If a version control system is used, this also provides standard methods such as merge requests can be used for publication of new packages and updated into the catalog. A local sip installation can then either clone the catalog (if in version control) or copy the latest released snapshot in order to being working with it, and can update the catalog by pulling updates or checking for new releases at the beginning of each sip command execution.

A sip catalog MAY also be maintained in SynBioHub as a published collection. As published collections in SynBioHub are not supposed to be modified, such a catalog may be downloaded once and assumed to remain current.

A sip implementation SHOULD be able to be configured to make use of multiple catalogs, which are accessed in a specified order. A RECOMMENDED configuration would have the first catalog be local unpublished packages, in order to allow pre-release testing of packages, and the last catalog be the main public sip catalog. Any additional catalogs would come between these in a user-specified order.

Resolving Dependencies

When building, checking, or using a Package, it will often be necessary to check that a dependency can be satisfied and to take action if it cannot be satisfied.

Direct dependencies SHOULD be resolved following one of three methods, depending on the nature of the dependency.

Any dependency with a package value having a prefix of the namespace of the root Package for a project SHOULD be resolved locally, by confirming that the required Package and/or objects are available at the expected locations within the project.
Any dependencies with a package value that precisely matches a dissociated package in the catalog SHOULD be resolved by attempting to download and convert material from its source, following the source specification for the dissociated package in the package catalog.
Any dependencies with a package value having a prefix of the namespace of atomic package in the catalog are resolved by locating the package in the catalog, confirming that a copy has been cached locally, and confirming that the required Package and/or objects are available from the specified package.

Any direct dependency that does not resolve following one of these methods SHOULD be considered an error condition.

Indirect dependencies MAY be resolved by recursively resolving direct dependencies.

Conflicts between dependencies (direct or indirect) can arise in two ways:

Incompatible version requirements on a dependency.
Dissociated package objects with identical identifiers but non-identical property values.

In either case, such a conflict is an error condition and needs to be resolved manually.

Resolving URIs of Package Contents

In addition to retrieving package materials by means of a package catalog, it is also necessary for the materials to be retrievable directly from their URIs. This maintains compatibility with RDF expectations and allows tools that are not package-aware to use packaged SBOL data.

To this end, every package published in a particular package catalog server should have a namespace that resolves or appropriately redirects to that server. In response to an HTTP request for a URL within the namespace of a package in the catalog, the server MUST handle requests as follows:

If the URL is the identity of a TopLevel object within the package's namespace, the server MUST return an SBOL document that contains the requested object. The server MAY determine the degree of associated materials to include (anywhere up to the entirety of the packages).
If the URL is within the package's namespace, but the requested object does not exist, the server MUST either return an error or return an SBOL document that does not contain the requested object.
If the URL is the identity of a non-TopLevel object within the package's namespace, then the server MAY treat it as either a request for a non-existent object or a request for the TopLevel object that contains the identity.

Note that one legitimate implementation of this protocol is to simply return the complete package document for any URI request within the package namespace.

Note also that per W3C recommendation, content negotation may be used to determine whether the document that is be returned should be SBOL (RDF) or browsable HTML, with SBOL being the default for requests that do not include content negotiation.

Dynamic vs. Static Package Contents

Referencing the contents of a package by their normal URLs gives a dynamic view of package contents, in which the contents are retrieved from the most recent version of the package.

To retrieve a static snapshot of package contents instead, the namespace can be changed to include the version, as described above. Retrieving such a URI MUST retrieve materials from the specific version of the package that have likewise been rewritten to change their namespace from [namespace] to [namespace]/[version]. In addition, all references to dependencies must be rewritten to URIs with the version of the dependency at the time of publication. This has the effect providing a static view of a snapshot of the package and its dependencies. As with packages, the rewritten version from the static snapshot should also have a prov:wasDerivedFrom link to the original URI.

Materials from dissociated packages pose an additional problem, because their contents are cached within the build artifact but have a different namespace. For use in a static snapshot, these materials thus MUST have their URIs rewritten into the package namespace, again with a prov:wasDerivedFrom link to the original URI. In this case, the namespace should be changed from [package-versioned-namespace]/.build/[namespace minus protocol] (e.g., a dissociated import https://synbiohub.org/public/igem/pSB1C3 in package https://example.com/MyPackage would have its identity changed to https://example.com/MyPackage/.build/synbiohub.org/public/igem/pSB1C3/

Example:

A package catalog is set up at https://example.com/, and one of the packages published there is https://example.com/FusionProteins/package, with namespace https://example.com/FusionProteins/. A request for https://example.com/FusionProteins/gfp-rfp-fusion retrieves an SBOL document containing the most recent version of a Component object with that identity (and, incidentally, the rest of the package as well). This Component has SubComponent objects references GFP and RFP from another package, which it depends on. A request for one of these, https://example.com/FluorescentProteins/GFP, retrieves an SBOL document containing the most recent version of that Component along with the rest of the contents of https://example.com/FluorescentProteins/package.

To retrieve the Component from a static view of version 1.3 of the fusion proteins package, the request would instead be for https://example.com/FusionProteins/1.3/gfp-rfp-fusion. Let us assume that at the time when version 1.3 was published the fluorescent proteins package was on version 2.0. In this case, in the static document retrieved, https://example.com/FusionProteins/1.3/gfp-rfp-fusion will have the SubComponent link for GFP rewritten to https://example.com/FluorescentProteins/2.0/GFP.

Storing Packages Within a Directory Tree

When package materials are stored in a hierarchical directory structure, it is useful to distinguish user-authored and tool-created content. User-authored content SHOULD be primary in the structure, while all tool-created content is potentially subject to deletion and reconstruction. It is thus RECOMMENDED that tool-created content be stored in a hidden subdirectory of the directory for which the content has been created.

Tool-created content falls into two categories, generated content and build artifacts, which SHOULD be stored separately from one another.

Generated Content

Generated content, while tool-created, consists of novel SBOL objects not stored elsewhere, and thus SHOULD be maintained persistently and managed with version control. To support human examination of version differences, generated content SHOULD be stored in sorted N-triples format.

When built with respect to a specific directory, generated content SHOULD be stored in a hidden subdirectory named .sip.

Each Package and its associated filed-derived sub-Package objects (but not sub-directory derived sub-packages), SHOULD be stored in sorted N-triples format in a file named .sip/package.nt.

Any user-defined material for the package (e.g., name, version) SHOULD be provided in a file named package.EXT in the package directory where EXT is any valid SBOL extension and that contains precisely one TopLevel object: the user-defined Package object. If this file is present, then the Package object in .sip/package.nt file should be generated by starting with the user-defined Package and adding subPackage and hasDependency properties based on the contents of the directory.

Imports from a dissociated package X SHOULD be stored in sorted N-triples format in a file named .sip/X.nt

Genetic design documents stored in a format other than SBOL3 (e.g., Excel, SBOL2) SHOULD have their derivative SBOL3 documents stored in the .sip package as well. For example if a package is defined in an Excel file named my_package.xlsx, an Excel-to-SBOL export of my_package.nt would be stored in the .sip package.

The .sip directory MUST contain nothing besides the package.nt, dissociated package files, and converted packages. The directory MAY, of course, omit these files before they have been built. The contents of each dissociated package files SHOULD contain precisely the set of dependencies indicated for that package in package.nt.

Build Artifacts

Since build artifacts are assemblages of copied TopLevel objects, they are redundant for version control. They thus SHOULD NOT be stored in the version control of the project from which they were generated, as the redundant storage creates opportunities for forking between copies while adding no new information.

Build artifacts are, however, appropriate objects to use for purposes of distribution. Because build artifacts are not intended for version-control and inspection, they MAY use any serialization format.

When built with respect to a specific directory, a build artifact SHOULD be stored in a hidden subdirectory named .build.

For a package named X, the name of a build artifact for package distribution (i.e., without atomic package dependencies) SHOULD be X-distribution-package.[EXTENSION], where [EXTENSION] is an appropriate extension for its format.

For a package named X, the name of a build artifact for stand-alone distribution (i.e., with all dependencies included in the artifact) SHOULD be X-standalone-package.[EXTENSION], where [EXTENSION] is an appropriate extension for its format.

The .build directory MAY be used to store other transient artifacts, such as intermediate outputs used in production of a package or cached files that are not yet converted to their final format.

Example:

Consider a package named ecoli-circuits with the following files stored in one root directory and two subdirectories:

[root]/package.ttl (user-defined content for the package)
[root]/composites.nt
[root]/regulatory/constitutive.xlsx (Excel-to-SBOL sheet including iGEM material)
[root]/regulatory/repressors.nt (includes iGEM and NCBI material)
[root]/regulatory/inducers.ttl
[root]/actuators/chromoproteins.nt (includes NBCI material)
[root]/actuators/fluorescence.nt

Generated content files:

[root]/.sip/package.nt
[root]/regulatory/.sip/package.nt
[root]/regulatory/.sip/constitutive.nt
[root]/regulatory/.sip/igem.nt
[root]/regulatory/.sip/ncbi.nt
[root]/actuators/.sip/package.nt
[root]/actuators/.sip/ncbi.nt

Package build for distribution:

[root]/.build/ecoli-circuits-package.ttl
[root]/.build/ecoli-circuits-standalone-package.ttl

The .ttl extension indicates a file is in the Turtle - Terse RDF Triple Language format. The .nt extension indicates a file is in the RDF N-Triples format.

Versioning and Release of Packages

The time required to build physical instantiations of genetic designs poses a challenge for the versioning and release of packages. On the one hand, one would generally want to check for and correct "cannot be built" bugs before a package is released. On the other hand, with current methods building designs has a high cost and often takes weeks or months. At the alpha or beta pre-release stages of package development, build issues are to be expected and need not block a release.

For production releases of a package, however, SemVer "release candidate" versioning SHOULD be used for any package that includes novel materials intended to be built as part of a release process. Specifically, a "release candidate" pre-release tag SHOULD be used to indicate the point where an intended release is sent into production.

If all new designs added in a release candidate are able to be built, then the release candidate SHOULD have its status upgraded from release candidate (pre-release) to release (e.g., from 1.0-rc1 to 1.0). If some designs in a release candidate are unable to be built, however, then either the failing materials can be dropped from the package (leaving only known-good materials suitable for release) or the designs can be adjusted to produce a new release candidate.

Note that these policies apply only to designs in a package, not to its dependencies. As a matter of practice, however, a production release generally SHOULD NOT have any dependency on any pre-release packages.

Example:

Consider an ecoli-circuits package at version 1.0, to which a number of new designs have been added with the intention of producing a 1.1 release. When design work is complete, including a check that the designs are expected to be able to be synthesized, the package is tagged for pre-release version 1.1-rc1 and the new designs are sent for synthesis.

Several weeks later, most of the designs have been synthesized successfully, but a few have failed. The failing designs are updated to try to resolve their synthesis issues. The package is then tagged as pre-release version 1.1-rc2 and the updated designs are sent for synthesis. This time all of the designs succeed, and the package version is updated to a 1.1 release without any other modification of its contents.

Alternately, if some designs had still failed, the package maintainers might have chosen instead to drop them from the release. In this case, the failing designs would be removed from the package, leaving only designs that had been successfully build. The reduced package is still a backward compatible enhancement over the 1.0 release, however, so it can be safely designated as the 1.1 release.

Note also that it is entirely possible that a package may have release candidates for multiple versions in progress at the same time. For example, a version 1.2-rc1 release candidate might be created and the new designs not present in 1.1-rc2 might be ordered for synthesis even while 1.1-rc2 is still being synthesized.

Example or Use Case

Examples and use cases are embedded in the specification material given above.

Backwards Compatibility

This SEP does not modify any existing definitions, and is thus backward compatible with SBOL 3.0.1

Discussion

On the matter of versioning, an alternative to retagging as a release candidate would be to skip directly to the a release version when an order is made (e.g., 1.0 rather than 1.0-rc1). Problems with an order would then be corrected by a new release (e.g., upgrading to 1.0.1 for adjusted sequences). An important problem with this approach, however, is that if a sequence turns out to be not synthesizable and is dropped from the package, then this is not a backward-compatible change and would require a major version number change (e.g., upgrading from 1.0 to 2.0). Marking as a release-candidate, however, retains the pre-release "don't count on the changes to be stable yet" status until construction has been completed and the new sequences can be used with confidence.

The actual process of pysically building a package can be thought of as similar to compiling a binary and to running a tests. In the future, it would be good to utilize the mechanism of Implementation objects and/or build artifacts of some sort to track which parts have been synthesized at least once. That could also support things like badges saying things like "53% synthesized". It is not currently clear whether this information should be stored in the package or elsewhere, and decisions on how to run this process will be deferred for future implementation.

Competing SEPs

None at present

References

Copyright

To the extent possible under law, SBOL developers has waived all copyright and related or neighboring rights to SEP 054. This work is published from: United States.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sep_054.md

sep_054.md

SEP 054 -- Managing genetic design packages

Abstract

Table of Contents

1.1 Design Goals

2.1 Terminology

2.2 Representation

Dependency

Package

2.3 Management Practices with Packages

Defining a Package from a Document

Defining a Package as an Aggregate of Documents

Defining a Dissociated Package

Collating a Native Package Build Artifact

Building and Using a Package Catalog

Publishing Packages to a Shared Package Catalog

Resolving Dependencies

Resolving URIs of Package Contents

Dynamic vs. Static Package Contents

Example:

Storing Packages Within a Directory Tree

Generated Content

Build Artifacts

Example:

Versioning and Release of Packages

Example:

References

Copyright

Files

sep_054.md

Latest commit

History

sep_054.md

File metadata and controls

SEP 054 -- Managing genetic design packages

Abstract

Table of Contents

1.1 Design Goals

2.1 Terminology

2.2 Representation

Dependency

Package

2.3 Management Practices with Packages

Defining a Package from a Document

Defining a Package as an Aggregate of Documents

Defining a Dissociated Package

Collating a Native Package Build Artifact

Building and Using a Package Catalog

Publishing Packages to a Shared Package Catalog

Resolving Dependencies

Resolving URIs of Package Contents

Dynamic vs. Static Package Contents

Example:

Storing Packages Within a Directory Tree

Generated Content

Build Artifacts

Example:

Versioning and Release of Packages

Example:

References

Copyright