SEP | |
---|---|
Title | Managing genetic design packages |
Authors | Jacob Beal ([email protected]), Tom Mitchell, Bryan Bartley, Gonzalo Vidal, James Scott Brown, Helen Scott, Noah Sprent, Vinoo Selvarajah |
Editor | |
Type | Data Model |
SBOL Version | SBOL 3.1 |
Replaces | None |
Status | Draft |
Created | 17-Oct-2021 |
Last modified | 19-Feb-2022 |
Issue | #112 |
This SEP proposes a set of practices for managing collections of genetic designs (or other information encoded in SBOL). These practices approach the sharing of SBOL information in a similar way to software package managers. These practices are intended to support similar management of shared SBOL information across multiple platforms and repositories.
- 1. Rationale
- 2. Specification
- 2.1 Terminology
- 2.2 Representation
- 2.3 Management Practices with Packages
- Defining a Package from a Document
- Defining a Package as an Aggregate of Documents
- Defining a Dissociated Package
- Collating a Native Package Build Artifact
- Building and Using a Package Catalog
- Publishing Packages to a Shared Package Catalog
- Resolving Dependencies
- Resolving URIs of Package Contents
- Storing Packages Within a Directory Tree
- Versioning and Release of Packages
- 3. Example or Use Case
- 4. Backwards Compatibility
- 5. Discussion
- 6. Competing SEPs
- References
- Copyright
We are now building collections of durable genetic design information (and similar) in multiple contexts, and some of these genetic designs draw on information in other collections of genetic designs. In these circumstances, we do not want to end up with genetic design information either being repeatedly duplicated (and forked) or becoming inaccessible.
Software development has dealt with similar problems in the past, and converged on various approaches based on notions of libraries or packages, coupled with package managers for automating the collections of required packages and their dependencies, and public repositories for maintaining designs. Examples include pip for Python, npm for Javascript, Maven for Java, and apt for Debian. All of these have generally similar architectures and interfaces, though specifics differ, in part due to the differences between the software environments being managed.
This document proposes a similar approach to package management for collections of information encoded in SBOL. Sharing of genetic design information is the initiating motivation, but the same approach can be applied to other types of information as well, including designs for experiments, sample inventories, and novel representations implemented as SBOL extensions.
All of the activities on the following non-exhaustive list are intended to be supported cleanly with this proposal:
- Import a design so that you don't have to define it locally
- Refer to a design from a "sibling" package within a single prject
- Refer to a design from some other package in some other project
- Be able to refer to things that aren't in a package, but some other external non-package source (e.g. NCBI, UniProt)
- Refer to things that have more than one potential source, possibly in different formats (e.g., iGEM parts direct from the repository in FASTA format and from SynBioHub in SBOL2 format)
- Refer to non-SBOL object that have more than one potential name (e.g., iGEM "BBa_E0040" vs. "E0040" and NCBI "L29345.1" vs. "L29345")
- Make a local cache of a set of designs, so that you don't have to keep downloading them (and can also avoid flaky server issues)
- Check whether materials in the cache are obsolete and need to be refreshed from their sources.
- Clean out cache materials and rebuild from scratch.
- Refer to a specific version of a collection, or to the latest version
- Know if a collection of package information is out of date
- Import a whole package into a document
- Import selected elements of a package into a document
- Handle at least materials in git, materials in SynBioHub, and materials at NCBI
- Avoid replicating or republishing materials imported from another package
- Build composite documents that include imported material from other packages
- House multiple packages in a single repository
- Rehome a repository and packages that are being migrated to a different location
- Run tests against updated packages or repositories without releasing them
- Make a "working document" by importing various packages
- Import dependencies of an import, as needed
This proposal is primarily intended to be a practice to follow, but does propose extension classes that may eventually be included in the specification.
This specification and set of practices for SBOL package management uses the following definitions
-
SBOL document: a set of SBOL TopLevel objects aggregated together such that they can be identified and retrieved by means of a single URI. This can come in multiple forms, including serialized into a file in an RDF format such as sorted N-triples, access via an online API such as a SynBioHub collection, or as a database entry.
-
Package: a systematic aggregation of SBOL materials intended to be distributed and used as a coherent whole. A package is associated with a specific namespace, and all TopLevel objects in the package must share that namespace. When stored in a hierarchical structure (e.g., files in directories), the relationship between package namespaces MUST be identical to the relationship between package locations in the structure. Note that packages can have sub-packages.
-
Root Package: a package that is not a sub-package of any other package. Root packages are the objects directly managed by a package catalog, while sub-packages are managed indirectly as part of their containing root package.
-
Native Package: a package whose contents are all defined in SBOL3.
-
Conversion Package: a source of non-SBOL3 material with stable URIs for retrieving objects (e.g., NCBI GenBank, UniProt, the iGEM Parts Repository), being used as package source for SBOL parts. Note that this also includes material encoded in prior versions of SBOL.
-
Generated Content: a set of SBOL TopLevel objects created from other information, either in SBOL (e.g., a generated package description) or external to SBOL (e.g., a converted FASTA file).
-
Build Artifact: a document derived by assembling a set of SBOL TopLevel objects from various packages without modifying the content of the TopLevel objects therein.
-
Dependency: Statement that a package refers to TopLevel objects from other packages. Dependencies may be declared at various levels of granularity:
- Object: only an individual TopLevel object (along with all of its child objects)
- Object and references: a TopLevel object and all of the other TopLevel objects that it references in its definition, except for via provenance (PROV-O) relations.
- Package: all of theTopLevel objects in a package, implicitly including all of its dependencies
-
Direct Dependency: A dependency that is specified in a package.
-
Indirect Dependency: A dependency that is not direct, but is specified in some other recursive dependency of a package.
-
Atomic Package: A package that is small enough that its contents can be readily managed and transmitted between machines as a single document.
-
Dissociated Package: A non-atomic package, i.e., one that that is too large to be used in toto as a dependency on the package level, and is thus always managed on the level of objects or objects and references instead. Every dissociated package SHOULD be a conversion package. For example, the conversion package for NCBI GenBank is a dissociated package, meaning that sequences in NCBI GenBank must be referenced and imported individually.
-
Cache: a store for local copies of packages (or fragments thereof) obtained from elsewhere. Caches may be shared or local to a package. Atomic packages will generally be stored in a shared cache, while dissociated packages will generally be stored in a local cache.
-
Package catalog: source for information about available packages and how they may be retrieved, based on their URIs.
-
Package manager: tool for using one or more package catalogs to retrieve, cache, and resolve dependencies in packages.
Dependency
extends the Identified
class with the following fields:
package
[ 1 ], URI: Root package that is the dependency or contains the dependencyversion
[0,1], string: Statement indicating the version of the package to use. Versions SHOULD use SemVer format. If this property is not provided, then any version is considered acceptable, with the highest version preferred. Future work: expand to include constraints as well, e.g., >=1.2, <2subpackage
[0,1], URI: If the dependency is or is contained within a specific sub-package within thepackage
, this property specifies its URI.object
[0,n], URI: If dependency is specificTopLevel
objects rather than a complete package, each instance of this property indicates one of those objects.includeObjectReferences
[0,1], boolean: By default, object dependencies are only to the specific object itself. If this value is set to true, then object dependencies also imply dependencies on all of the TopLevel objects recursively referred to by objects in the dependency set. For example, dependency on a plasmid implies depending on the insert it is carrying, which depends in turn on the functional units in the insert, which also depends on the basic parts like promoters, CDSs, terminators, that they contain.
A dependency list MAY contain redundant or overlapping entries. For example, a dependency list may contain both a package and a supackage within that package, both a subpackage and specific objects in the subpackage, or even multiple entries for the same object. In such cases, the effective dependency list is the union of all of the individual dependencies in the list.
This permissive unioning behavior is intended to allow independently defined dependency lists to be safely merged, e.g., in a package that depends on two other packages. Accordingly, a tool SHOULD NOT consider it to be an issue of there are redundancies between Dependency
objects contained by different Package
objects.
A tool MAY, however, choose to warn users about redundancies within the dependencies of an individual Package
.
Package
extends the Collection
class with the following fields:
version
[0,1], string: The value of this property SHOULD use SemVer to indicate the version of this package. This property SHOULD be set if and only if a package is a root package.conversion
[ 1 ], boolean: Indicates if this is a native package (false) or conversion package (true)dissociated
[0,1], string: If not set, this is an atomic package. If set, indicates this package is a dissociated package of the type identified by the string (see practices below).hasDependency
[0,n], URI:Dependency
child objectssubPackage
[0,n], URI: Indicates the set ofPackage
objects aggregated in thePackage
.
The displayId
of a Package
SHOULD be package
.
To avoid potential confusion between packages with similar names, the identity URI of a package SHOULD be lower-case ASCII, and the path elements of the URI SHOULD be only alphanumeric or dash characters (e.g., https://igem.org/fluorescent-proteins/package
).
Every member
of a Package
MUST have the same value for hasNamespace
as the package itself.
A package SHOULD have dissociated
set only if conversion
is also true.
A package SHOULD only have values for subPackage
if it is a native package (i.e., conversion
is false).
For packages with conversion
equal to true, the version
property SHOULD match its major version number to the major version of the data source, where applicable. Other elements of the versioning SHOULD be used to indicate versioning of the converted material.
Note that any change to the sequence of a part SHOULD be a major version change, since there is no guarantee that the part will continue to have the same behavior. There are certain circumstances where it would not be, such as correcting a bug in which a sequence was not listed correctly to begin with. To add a sequence that is expected to have better performance, do not change the sequence: add a new part and deprecate the old one.
Every Package
object pointed to by subPackage
MUST have a hasNamespace
that is equal to the hasNamespace
of the Package
object plus a local
sub-path. For example https://example.org/MyPackage/
might have subPackage
values https://example.org/MyPackage/promoters
and https://example.org/MyPackage/regulatory/repressors
Given an SBOL document, a native Package
can be computed by inspecting the collection of TopLevel
objects in the document. If all TopLevel
objects have the same hasNamespace
value, then they form a well-defined Package
with identity [namespace]/package
and members
equal to the identities of the TopLevel
objects.
Object dependencies for a Package
with includeObjectReference=true
can be computed by collecting references to TopLevel
objects that are not contained within the package and resolving those against a catalog of packages.
Other granularities can only be explicitly defined.
Conversion packages cannot be computed and need to be explicitly defined.
A Package
can be computed by inspecting any collection of SBOL documents, first converting each SBOL document to a Packages
, then aggregating the Package
objects into another Package
that contains them all.
For example, a Package
can be computed from a directory of files.
In the case of building a Package
from a directory, if all of the document's subPackage
objects' namespaces share a prefix, then they form a well-defined Package
with identity [prefix]/package
and with conversion=false
and dissociated
not set. In this case, the Package
will typically have no member
values (since a directory contains documents, rather than being a document itself) and its hasDependency
values will be the union of the hasDependency
values of its sub-packages.
The name
and description
of a package cannot be computed. If they are desired, they need to be explicitly provided.
Root packages additionally need to have their version
explicitly provided.
Any package catalog SHOULD contain conversion packages for common conversion packages, such as NCBI GenBank and the iGEM Parts Repository (see also defining dissociated packages below).
Dissociated packages are special cases. In order to successfully use materials from a dissociated package, the following information needs to be known:
- How to convert a UID for the source into its canonical form (e.g., iGEM "E0040" into "BBa_E0040", NCBI "L29345" into "L29345.1").
- How to map a canonical UID into a retrieval URI
- What format retrieved information is expected to be in.
- What namespace to prepend to items at conversion.
The namespace is set by the Package
object for the dissociated package.
The other aspects are hard-coded based on the string value provided for the dissociated
property.
At present, the following values are defined:
ncbi
: Information stored in NCBIigem
: Information stored in the iGEM parts repository and/or the iGEM SBOL2 conversion in SynBioHub athttps://synbiohub.org/
synbiohub
: SynBioHub instance located atnamespace
Behavior with respect to any other value of dissociated
is currently undefined, and thus any other value of dissociated
SHOULD be treated as an error condition.
The materials of a native Package
can be collated into a single SBOL document. This is a build artifact that is useful for distributing materials in order to satisfy dependencies.
This document SHOULD include the Package
object, all of its sub-Package
objects recursively, and all of their member objects and children.
Which dependencies are included in the build artifact depends on the planned use of the build artifact:
-
For use in package cataloging and distribution, the build artifact SHOULD NOT include any materials from non-dissociated dependencies (except for dependencies between the sub-packages of the package itself). This is because these packages may be included in multiple dependency paths. Instead, they are assumed to be able to be fetched independently during dependency resolution.
The build artifact SHOULD, however, include all objects included as direct dependencies from dissociated package materials. This is because dissociated packages are too large to cache and SHOULD NOT be in SBOL format, meaning that repeating the fetch and conversion of a dissociated package object is not guaranteed to produce an identical object. Including them in the package guarantees stability of the package materials.
Indirect dissociated package dependencies SHOULD NOT be included, however, because they will be included in the package that has them as a direct dependency.
-
For use in building an "exported" collection of information to be used outside of the package management system (e.g., for preparation of a synthesis order), all dependencies of all types SHOULD be included, including indirect dependencies.
A catalog of packages can be built by collecting together a set of root Package
objects.
To keep the catalog manageable in size, the catalog SHOULD contain only root Package
information and SHOULD NOT contain sub-Package
objects or the actual contents of packages.
A catalog also needs to be able to track multiple versions of the same package.
If the catalog is to be a single document, then this means that the Package
objects for the different versions must not share the same URI.
To this end, each Package
in the catalog MUST have its version
property set. The identity of the Package
object in the catalog is then rewritten from [namespace]/[displayId]
to [namespace]/[version]/[displayId]
, with a prov:wasDerivedFrom
link connecting it to the original identity.
Every root Package
in such a catalog MUST NOT have a namespace that is a prefix of any other Package
in the catalog, except for other versions of itself.
Packages can then be looked up in the package catalog by prefix testing on identity URIs. This will identify the set of available package versions, which can then be used for dependency resolution.
Information about where to retrieve the contents of a Package
can be added via Attachment
objects.
The Attachment
object points to a specific URI from which a build artifact can be retrieved, that build artifact containing the Package
and all of its contents for the identified release.
Each Package
in the catalog MUST have at least one such Attachment
.
Native libraries should use format=http://sbols.org/v3#
.
TODO: this should use EDAM, but EDAM currently has no entry for SBOL3, and this is required for distinguishing between SBOL3 conversions from prior SBOL formats.
For conversion libraries, the library may have a build artifact that has already been converted into SBOL3, in which case it should also use the SBOL3 format.
Artifacts in other formats should should use EDAM to indicate the format in order to allow conversion.
When there are multiple attachments in the same format, they represent mirror alternatives for retrieval and thus their contents MUST be identical.
Identical contents can be efficiently tested for by comparing attachment hash values.
A package may then be retrieved for use by using the information in its Attachment
objects in the catalog to download a build artifact containing its contents, which may or may not require conversion after download.
In the case of multiple attachments, which attachments are attempted in which order is left for tools to determine heuristically.
Packages downloaded with the aid of a catalog SHOULD be stored in a shared cache location, such that they do not need to be downloaded again if they are also used in another project. Note also that there is no reason that such a cache cannot include multiple versions of a package, in case different versions are being used by different projects.
Note that the Attachment
system also helps support migration of package material, as new locations can be added as new Attachment
objects, and obsolete locations can be deprecated and then eventually have their Attachment
objects deleted.
TODO: It may also be useful to consider a system of nicknames, such as "ncbi" or "igem:promoters" that can be used as short-form aliases for the full package URLs. This is likely best deferred to a future SEP, however.
A public shared catalog of published packages will be valuable to maintain. The working title for this collection is the "SBOL index of packages", AKA "sip" (a deliberate reference to Python's pip)
Contributors SHOULD NOT submit packages that are arbitrary collections of designs without a clear engineering function. For example, a supplementary information file collecting all the genetic constructs in a scientific publication would not generally make a good sip
package.
Likewise, forking of packages is discouraged: improvements to designs SHOULD result in submission of a new version of a package rather than a new package.
A public catalog SHOULD be maintained in a version control system such as git or a database or other system providing sufficiently similar capabilities.
If a version control system is used, this also provides standard methods such as merge requests can be used for publication of new packages and updated into the catalog.
A local sip
installation can then either clone the catalog (if in version control) or copy the latest released snapshot in order to being working with it, and can update the catalog by pulling updates or checking for new releases at the beginning of each sip
command execution.
A sip
catalog MAY also be maintained in SynBioHub as a published collection. As published collections in SynBioHub are not supposed to be modified, such a catalog may be downloaded once and assumed to remain current.
A sip
implementation SHOULD be able to be configured to make use of multiple catalogs, which are accessed in a specified order.
A RECOMMENDED configuration would have the first catalog be local unpublished packages, in order to allow pre-release testing of packages, and the last catalog be the main public sip
catalog.
Any additional catalogs would come between these in a user-specified order.
When building, checking, or using a Package
, it will often be necessary to check that a dependency can be satisfied and to take action if it cannot be satisfied.
Direct dependencies SHOULD be resolved following one of three methods, depending on the nature of the dependency.
- Any dependency with a
package
value having a prefix of the namespace of the rootPackage
for a project SHOULD be resolved locally, by confirming that the requiredPackage
and/or objects are available at the expected locations within the project. - Any dependencies with a
package
value that precisely matches a dissociated package in the catalog SHOULD be resolved by attempting to download and convert material from its source, following the source specification for the dissociated package in the package catalog. - Any dependencies with a
package
value having a prefix of the namespace of atomic package in the catalog are resolved by locating the package in the catalog, confirming that a copy has been cached locally, and confirming that the requiredPackage
and/or objects are available from the specified package.
Any direct dependency that does not resolve following one of these methods SHOULD be considered an error condition.
Indirect dependencies MAY be resolved by recursively resolving direct dependencies.
Conflicts between dependencies (direct or indirect) can arise in two ways:
- Incompatible version requirements on a dependency.
- Dissociated package objects with identical identifiers but non-identical property values.
In either case, such a conflict is an error condition and needs to be resolved manually.
In addition to retrieving package materials by means of a package catalog, it is also necessary for the materials to be retrievable directly from their URIs. This maintains compatibility with RDF expectations and allows tools that are not package-aware to use packaged SBOL data.
To this end, every package published in a particular package catalog server should have a namespace that resolves or appropriately redirects to that server. In response to an HTTP request for a URL within the namespace of a package in the catalog, the server MUST handle requests as follows:
- If the URL is the identity of a
TopLevel
object within the package's namespace, the server MUST return an SBOL document that contains the requested object. The server MAY determine the degree of associated materials to include (anywhere up to the entirety of the packages). - If the URL is within the package's namespace, but the requested object does not exist, the server MUST either return an error or return an SBOL document that does not contain the requested object.
- If the URL is the identity of a non-
TopLevel
object within the package's namespace, then the server MAY treat it as either a request for a non-existent object or a request for theTopLevel
object that contains the identity.
Note that one legitimate implementation of this protocol is to simply return the complete package document for any URI request within the package namespace.
Note also that per W3C recommendation, content negotation may be used to determine whether the document that is be returned should be SBOL (RDF) or browsable HTML, with SBOL being the default for requests that do not include content negotiation.
Referencing the contents of a package by their normal URLs gives a dynamic view of package contents, in which the contents are retrieved from the most recent version of the package.
To retrieve a static snapshot of package contents instead, the namespace can be changed to include the version, as described above.
Retrieving such a URI MUST retrieve materials from the specific version of the package that have likewise been rewritten to change their namespace from [namespace]
to [namespace]/[version]
.
In addition, all references to dependencies must be rewritten to URIs with the version of the dependency at the time of publication.
This has the effect providing a static view of a snapshot of the package and its dependencies.
As with packages, the rewritten version from the static snapshot should also have a prov:wasDerivedFrom
link to the original URI.
Materials from dissociated packages pose an additional problem, because their contents are cached within the build artifact but have a different namespace.
For use in a static snapshot, these materials thus MUST have their URIs rewritten into the package namespace, again with a prov:wasDerivedFrom
link to the original URI.
In this case, the namespace should be changed from [package-versioned-namespace]/.build/[namespace minus protocol]
(e.g., a dissociated import https://synbiohub.org/public/igem/pSB1C3
in package https://example.com/MyPackage
would have its identity changed to https://example.com/MyPackage/.build/synbiohub.org/public/igem/pSB1C3/
A package catalog is set up at https://example.com/
, and one of the packages published there is https://example.com/FusionProteins/package
, with namespace https://example.com/FusionProteins/
.
A request for https://example.com/FusionProteins/gfp-rfp-fusion
retrieves an SBOL document containing the most recent version of a Component
object with that identity (and, incidentally, the rest of the package as well).
This Component
has SubComponent
objects references GFP and RFP from another package, which it depends on.
A request for one of these, https://example.com/FluorescentProteins/GFP
, retrieves an SBOL document containing the most recent version of that Component
along with the rest of the contents of https://example.com/FluorescentProteins/package
.
To retrieve the Component
from a static view of version 1.3 of the fusion proteins package, the request would instead be for https://example.com/FusionProteins/1.3/gfp-rfp-fusion
.
Let us assume that at the time when version 1.3 was published the fluorescent proteins package was on version 2.0. In this case, in the static document retrieved, https://example.com/FusionProteins/1.3/gfp-rfp-fusion
will have the SubComponent
link for GFP rewritten to https://example.com/FluorescentProteins/2.0/GFP
.
When package materials are stored in a hierarchical directory structure, it is useful to distinguish user-authored and tool-created content. User-authored content SHOULD be primary in the structure, while all tool-created content is potentially subject to deletion and reconstruction. It is thus RECOMMENDED that tool-created content be stored in a hidden subdirectory of the directory for which the content has been created.
Tool-created content falls into two categories, generated content and build artifacts, which SHOULD be stored separately from one another.
Generated content, while tool-created, consists of novel SBOL objects not stored elsewhere, and thus SHOULD be maintained persistently and managed with version control. To support human examination of version differences, generated content SHOULD be stored in sorted N-triples format.
When built with respect to a specific directory, generated content SHOULD be stored in a hidden subdirectory named .sip
.
Each Package
and its associated filed-derived sub-Package
objects (but not sub-directory derived sub-packages), SHOULD be stored in sorted N-triples format in a file named .sip/package.nt
.
Any user-defined material for the package (e.g., name, version) SHOULD be provided in a file named package.EXT
in the package directory where EXT
is any valid SBOL extension and that contains precisely one TopLevel
object: the user-defined Package
object. If this file is present, then the Package
object in .sip/package.nt
file should be generated by starting with the user-defined Package
and adding subPackage
and hasDependency
properties based on the contents of the directory.
Imports from a dissociated package X SHOULD be stored in sorted N-triples format in a file named .sip/X.nt
Genetic design documents stored in a format other than SBOL3 (e.g., Excel, SBOL2) SHOULD have their derivative SBOL3 documents stored in the .sip
package as well. For example if a package is defined in an Excel file named my_package.xlsx
, an Excel-to-SBOL export of my_package.nt
would be stored in the .sip
package.
The .sip
directory MUST contain nothing besides the package.nt
, dissociated package files, and converted packages.
The directory MAY, of course, omit these files before they have been built.
The contents of each dissociated package files SHOULD contain precisely the set of dependencies indicated for that package in package.nt
.
Since build artifacts are assemblages of copied TopLevel
objects, they are redundant for version control. They thus SHOULD NOT be stored in the version control of the project from which they were generated, as the redundant storage creates opportunities for forking between copies while adding no new information.
Build artifacts are, however, appropriate objects to use for purposes of distribution. Because build artifacts are not intended for version-control and inspection, they MAY use any serialization format.
When built with respect to a specific directory, a build artifact SHOULD be stored in a hidden subdirectory named .build
.
For a package named X
, the name of a build artifact for package distribution (i.e., without atomic package dependencies) SHOULD be X-distribution-package.[EXTENSION]
, where [EXTENSION]
is an appropriate extension for its format.
For a package named X
, the name of a build artifact for stand-alone distribution (i.e., with all dependencies included in the artifact) SHOULD be X-standalone-package.[EXTENSION]
, where [EXTENSION]
is an appropriate extension for its format.
The .build
directory MAY be used to store other transient artifacts, such as intermediate outputs used in production of a package or cached files that are not yet converted to their final format.
Consider a package named ecoli-circuits
with the following files stored in one root directory and two subdirectories:
[root]/package.ttl
(user-defined content for the package)[root]/composites.nt
[root]/regulatory/constitutive.xlsx
(Excel-to-SBOL sheet including iGEM material)[root]/regulatory/repressors.nt
(includes iGEM and NCBI material)[root]/regulatory/inducers.ttl
[root]/actuators/chromoproteins.nt
(includes NBCI material)[root]/actuators/fluorescence.nt
Generated content files:
[root]/.sip/package.nt
[root]/regulatory/.sip/package.nt
[root]/regulatory/.sip/constitutive.nt
[root]/regulatory/.sip/igem.nt
[root]/regulatory/.sip/ncbi.nt
[root]/actuators/.sip/package.nt
[root]/actuators/.sip/ncbi.nt
Package build for distribution:
[root]/.build/ecoli-circuits-package.ttl
[root]/.build/ecoli-circuits-standalone-package.ttl
The .ttl
extension indicates a file is in the Turtle - Terse RDF Triple Language format.
The .nt
extension indicates a file is in the RDF N-Triples format.
The time required to build physical instantiations of genetic designs poses a challenge for the versioning and release of packages. On the one hand, one would generally want to check for and correct "cannot be built" bugs before a package is released. On the other hand, with current methods building designs has a high cost and often takes weeks or months. At the alpha or beta pre-release stages of package development, build issues are to be expected and need not block a release.
For production releases of a package, however, SemVer "release candidate" versioning SHOULD be used for any package that includes novel materials intended to be built as part of a release process. Specifically, a "release candidate" pre-release tag SHOULD be used to indicate the point where an intended release is sent into production.
If all new designs added in a release candidate are able to be built, then the release candidate SHOULD have its status upgraded from release candidate (pre-release) to release (e.g., from 1.0-rc1
to 1.0
).
If some designs in a release candidate are unable to be built, however, then either the failing materials can be dropped from the package (leaving only known-good materials suitable for release) or the designs can be adjusted to produce a new release candidate.
Note that these policies apply only to designs in a package, not to its dependencies. As a matter of practice, however, a production release generally SHOULD NOT have any dependency on any pre-release packages.
Consider an ecoli-circuits
package at version 1.0
, to which a number of new designs have been added with the intention of producing a 1.1
release.
When design work is complete, including a check that the designs are expected to be able to be synthesized, the package is tagged for pre-release version 1.1-rc1
and the new designs are sent for synthesis.
Several weeks later, most of the designs have been synthesized successfully, but a few have failed.
The failing designs are updated to try to resolve their synthesis issues. The package is then tagged as pre-release version 1.1-rc2
and the updated designs are sent for synthesis.
This time all of the designs succeed, and the package version is updated to a 1.1
release without any other modification of its contents.
Alternately, if some designs had still failed, the package maintainers might have chosen instead to drop them from the release.
In this case, the failing designs would be removed from the package, leaving only designs that had been successfully build.
The reduced package is still a backward compatible enhancement over the 1.0
release, however, so it can be safely designated as the 1.1
release.
Note also that it is entirely possible that a package may have release candidates for multiple versions in progress at the same time.
For example, a version 1.2-rc1
release candidate might be created and the new designs not present in 1.1-rc2
might be ordered for synthesis even while 1.1-rc2
is still being synthesized.
Examples and use cases are embedded in the specification material given above.
This SEP does not modify any existing definitions, and is thus backward compatible with SBOL 3.0.1
On the matter of versioning, an alternative to retagging as a release candidate would be to skip directly to the a release version when an order is made (e.g., 1.0
rather than 1.0-rc1
).
Problems with an order would then be corrected by a new release (e.g., upgrading to 1.0.1
for adjusted sequences).
An important problem with this approach, however, is that if a sequence turns out to be not synthesizable and is dropped from the package, then this is not a backward-compatible change and would require a major version number change (e.g., upgrading from 1.0
to 2.0
).
Marking as a release-candidate, however, retains the pre-release "don't count on the changes to be stable yet" status until construction has been completed and the new sequences can be used with confidence.
The actual process of pysically building a package can be thought of as similar to compiling a binary and to running a tests.
In the future, it would be good to utilize the mechanism of Implementation
objects and/or build artifacts of some sort to track which parts have been synthesized at least once.
That could also support things like badges saying things like "53% synthesized".
It is not currently clear whether this information should be stored in the package or elsewhere, and decisions on how to run this process will be deferred for future implementation.
None at present
To the extent possible under law,
SBOL developers
has waived all copyright and related or neighboring rights to
SEP 054.
This work is published from:
United States.