-
Notifications
You must be signed in to change notification settings - Fork 9
/
vocabulary.tex
189 lines (136 loc) · 15.8 KB
/
vocabulary.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
% -----------------------------------------------------------------------------
\section{Conventions}
% -----------------------------------------------------------------------------
This section provides some preliminary information to aid in the understanding of the specification.
The SBOL data model is specified using Unified Modeling Language (UML) 2.0 diagrams \href{http://www.omg.org/spec/UML/2.0/}{(OMG 2005)}. This section reviews terminology conventions, the basics of UML diagrams, and our naming conventions.
\subsection{Terminology Conventions}
This document indicates requirement levels using the controlled vocabulary specified in \href{https://tools.ietf.org/html/rfc2119}{IETF RFC 2119}.
In particular, the key words ``MUST'', ``MUST NOT'', ``REQUIRED'', ``SHALL'', ``SHALL NOT'', ``SHOULD'', ``SHOULD NOT'', ``RECOMMENDED'', ``MAY'', and ``OPTIONAL'' in this document are to be interpreted as described in RFC 2119.
\begin{itemize}
\item The words ``MUST'', ``REQUIRED'', or ``SHALL'' mean that the item is an absolute requirement.
\item The phrases ``MUST NOT'' or ``SHALL NOT'' mean that the item is an absolute prohibition.
\item The word ``SHOULD'' or the adjective ``RECOMMENDED'' mean that there might exist valid reasons in particular circumstances to ignore a particular item, but the full implications need to be understood and carefully weighed before choosing a different course.
\item The phrases ``SHOULD NOT'' or ``NOT RECOMMENDED'' mean that there might exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications need to be understood and the case carefully weighed before implementing any behavior described with this label.
\item The word ``MAY'' or the adjective ``OPTIONAL'' mean that an item is truly optional.
\end{itemize}
\subsection{UML Diagram Conventions}
\label{sec:umldiagrams}
The types of biological design data modeled by SBOL are commonly referred to as {\em classes}, especially when discussing the details of software implementation. Each SBOL class can be instantiated by many SBOL objects. These objects MAY contain data that differ in content, but they MUST agree on the type and form of their data as dictated by their common class. Classes are represented in UML diagrams as rectangles labeled at the top with class names (see \ref{fig:uml_sampler} for examples).
\begin{figure}[ht]
\begin{center}
\includegraphics[width=\textwidth]{uml/uml_sampler.pdf}
\caption{Examples of UML diagram conventions used in this document}
\label{fig:uml_sampler}
\end{center}
\end{figure}
Classes can be connected to other classes by association properties, which are represented in UML diagrams as arrows. These arrows are labeled with data cardinalities in order to indicate how many values a given association property can possess (see below). The remaining (non-association) properties of a class are listed below its name. Each of the latter properties is labeled with its data type and cardinality.
In the case of an association property, the class from which the arrow originates is the owner of the association property. A diamond at the origin of the arrow indicates the type of association.
Open-faced diamonds indicate shared aggregation, also known as a reference, in which the owner of the association property exists independently of its value.
By contrast, filled diamonds indicate composite aggregation, also known as a part-whole relationship, in which the value of the association property MUST NOT exist independently of its owner.
In addition, in the SBOL data model, it is REQUIRED that the value of each composite aggregation property is a unique SBOL object (that is, not the value for more than one such property).
Note that in all cases, composite aggregation is used in such a way that there SHOULD NOT be duplication of such objects.
Such objects are also commonly referred to as ``child'' objects, and their owning objects as ``parent'' objects.
All SBOL properties are labeled with one of several restrictions on data cardinality. These are defined, per RDF, as:
\begin{itemize}
\item $1$ - EXACTLY ONE: the property is REQUIRED, and there MUST be exactly one value for this property.
\item $0 \ldots 1$ - ZERO OR ONE: the property is optional, such that there MAY be a single value for this property, or it MAY be absent.
\item $0 \ldots *$ - ZERO OR MORE: the property is unbounded, such that there MAY be any number of values for this property, including none.
\item $1 \ldots *$ - ONE OR MORE: the property is REQUIRED, such that there MAY be any number of values for this property, as long as there is at least one.
\end{itemize}
Finally, classes can inherit the properties of other classes. Inheritance relationships are represented in UML diagrams as open-faced, triangular arrows that point from the inheriting class to the inherited class. Some classes in the SBOL data model cannot be instantiated as objects and exist only to group common properties for inheritance.
These classes are known as abstract classes and are noted as such in their descriptions.
\subsection{Naming and Typographic Conventions}
\label{sec:nameconventions}
SBOL classes are named using upper ``camel case,'' meaning that each word is capitalized and all words are run together without spaces, e.g., \sbol{Identified}, \sbol{SequenceFeature}.
Properties, on the other hand, are named using lower camel case, meaning that they begin lowercase (e.g., \sbolmult{role:C}{role}) but if they consist of multiple words, all words after the first begin with an uppercase letter (e.g., \sbol{roleIntegration}).
SBOL properties are always given singular names irrespective of their cardinality, e.g., \sbolmult{role:C}{role} is used rather than \sbolmult{role:C}{roles} even though a component can have multiple roles.
This is because each relation can potentially stand on its own, irrespective of the existence of others in the set.
Two conventions are used for property names, {\tt name} and {\tt hasName}.
When a property is pointing to a class using the same name, it uses the {\tt hasName} convention (e.g., the \sbol{Component} class uses \sbol{hasFeature} to point to a \sbol{Feature} object).
When the property uses a different name than the class of the object it points to, it uses the {\tt name} convention instead (e.g., the \sbol{Constraint} class uses \sbol{subject} to point to a \sbol{Feature} object).
% -----------------------------------------------------------------------------
\section{Identifiers and Types}
% -----------------------------------------------------------------------------
\subsection{Internationalized Resource Identifiers}
\label{sec:IRIstructure}
\label{sec:identity}
As SBOL is built upon the Resource Description Framework (RDF), all class instances are identified by an Internationalized Resource Identifier (IRI), such as a URL or UUID.
In the SBOL data model, the value of an association property MUST be a \sbol{IRI} or set of \sbol{IRI}s that refer to SBOL objects belonging to the class at the tip of the arrow. Every \sbol{Identified} object's IRI MUST be globally unique among all other \sbol{Identified} object IRIs. It is also highly RECOMMENDED that the \sbol{IRI} structure follows the recommended best practices for compliant \sbol{IRI}s specified in \ref{sec:compliant}.
Whenever a \sbol{TopLevel} object's URI is a URL (e.g., following the conventions of HTTP(S) rather than a UUID), its structure MUST comply with the following rules:
\begin{itemize}
\item A \sbol{TopLevel} URL MUST use the following pattern:
\texttt{[namespace]/[local]/[displayId]}, where \texttt{namespace} and \sbol{displayId} are required fragments, and the \texttt{local} fragment is an optional relative path.
For example, a \sbol{Component} might have the URL~\path{https://synbiohub.org/public/igem/BBa_J23070}, where \texttt{namespace} is \path{https://synbiohub.org}, \texttt{local} is \path{public/igem}, and \texttt{displayId} is \path{BBa_J23070}.
\item A \sbol{TopLevel} object's URL MUST NOT be included as prefix for any other \sbol{TopLevel} object.
For example, the \path{BBa_J23070_seq} \sbol{Sequence} object cannot have a URL of \path{https://synbiohub.org/public/igem/BBa_J23070/BBa_J23070_seq}, since the \path{https://synbiohub.org/public/igem/BBa_J23070} prefix is already used as a URL for the \path{BBa_J23070} \sbol{Component} object.
\item The URL of any child or nested object MUST use the following pattern:\texttt{[parent]/[displayId]}, where \texttt{parent} is the URL of its parent object.
Multiple layers of child objects are allowed using the same\\ \texttt{[parent]/[displayId]} pattern recursively.
For example, a \sbol{SequenceFeature} object owned by the \path{BBa_J23070} \sbol{Component} and having a \sbol{displayId} of \texttt{SequenceFeature1} will have a URL of \path{https://synbiohub.org/public/igem/BBa_J23070/SequenceFeature1}.
Similarly, if the \texttt{SequenceFeature1} object has a \sbol{Location} child object with a \sbol{displayId} of \texttt{Location1}, then that object will have the URL \path{https://synbiohub.org/public/igem/BBa_J23070/SequenceFeature1/Location1}.
\end{itemize}
\subsection{SBOL URLs}
\label{sec:sbolURLs}
The SBOL namespace, which is \url{http://sbols.org/v3\#}, is used to indicate which entities and properties in an SBOL document are defined by SBOL.
For example, the URL of the type \sbol{Component} is \url{http://sbols.org/v3\#Component}.
This convention is assumed throughout the specification.
The SBOL namespace MUST NOT be used for any entities or properties not defined in this specification.
Other namespaces are also used by SBOL, however.
Where possible, we have re-used predicates from widely-used terminologies (such as Dublin Core~\cite{dcmi2012}) to expose as much of the data as practical to such standard RDF tooling.
Similarly, existing biological ontologies are used where applicable for specifying types, roles, etc.
Likewise, Section~\ref{sec:complementaryStandards} details complementary standards that are RECOMMENDED for use in combination with SBOL.
\subsection{Primitive Data Types}
\label{sec:datatypes}
\label{sec:String}
\label{sec:Integer}
\label{sec:Long}
\label{sec:Double}
\label{sec:Boolean}
\label{sec:IRI}
\label{sec:URL}
\label{sec:literal}
When SBOL uses simple ``primitive'' data types such as \sbol{String}s or \sbol{Integer}s, these are defined as the following specific formal types:
\begin{itemize}
\item \sbol{String}: \href{http://www.w3.org/2001/XMLSchema}{http://www.w3.org/2001/XMLSchema\#string}\\
{\em Example: ``LacI coding sequence''}
\item \sbol{Integer}: \href{http://www.w3.org/2001/XMLSchema}{http://www.w3.org/2001/XMLSchema\#integer}\\
{\em Example: 3}
\item \sbol{Long}: \href{http://www.w3.org/2001/XMLSchema}{http://www.w3.org/2001/XMLSchema\#long}\\
{\em Example: 9223372036854775806}
\item \sbol{Double}: \href{http://www.w3.org/2001/XMLSchema}{http://www.w3.org/2001/XMLSchema\#double}\\
{\em Example: 3.14159}
\item \sbol{Boolean}: \href{http://www.w3.org/2001/XMLSchema}{http://www.w3.org/2001/XMLSchema\#boolean}\\
{\em Example: \external{true}}
\end{itemize}
The term \sbol{literal} is used to denote an object that can be any of the five types listed above.
In addition to the simple types listed above, SBOL also uses objects with types \emph{Internationalized Resource Identifier} (\sbol{IRI}).
It is important to realize that in RDF, a \sbol{IRI} might or might not be a resolvable URL (web address).
A \sbol{IRI} is always a globally unique identifier within a structured namespace. In some cases, that name is also a reference to (or within) a document, and in some cases that document can also be retrieved (e.g., using a web browser).
\subsection{SBOL Types}
\label{sec:sbolTypes}
All SBOL objects are given the most specific \external{rdfType} in the SBOL 3 namespace (\uri{http://sbols.org/v3\#}) that defines the type of the object.
Likewise, properties in the SBOL 3 namespace should only be used by objects with an SBOL 3 \external{rdfType}.
SBOL does not use multiple inheritance: all SBOL classes are disjoint except with respect to their abstract parent classes.
Accordingly, an object MUST NOT be given two \external{rdfType} properties referring to disjoint classes in the SBOL 3 namespace.
An object MAY have redundant \external{rdfType} properties for its parent types, but this is NOT RECOMMENDED.
For example, an object cannot have both the \external{rdfType} of \sbol{Collection} and \sbol{Component}.
A \sbol{Component} could also have an \external{rdfType} for \sbol{TopLevel} and \sbol{Identified}, but this is discouraged.
\subsection{Object Closure and Document Composition}
In RDF, there is no requirement that all of the information about an object be stored in one location.
Instead, there is a ``open world'' assumption that additional triples describing the object may be acquired at any time.
Documents are allowed to be fragmented and composed in an arbitrary manner, down to their underlying atomic triples, with no consideration for object structure.
This limits the ability to reason about properties of objects and validate the correctness of a model.
For example, it would not be possible to validate that an \sbol{Identified} object has no more than one value for its \sbol{displayId} property, because it would not be possible to determine whether some other document somewhere in the world holds a second value for the property.
SBOL addresses this by adding an object closure assumption that allows stronger reasoning about individual objects and their children.
For any given SBOL document, if the document contains an \external{rdfType} statement regarding an \sbol{Identified} object $X$, then it is assumed that the document also contains all other property statements about object $X$ as well.
This enables strong validation rules, since any statement of the form ``$X$ {\it predicate} $Y$'' that is not present can be assumed to be false.
For example, if a document has one value for an object's \sbol{displayId}, then it is valid to conclude that there are no other \sbol{displayId} values, and thus its "zero or one" cardinality requirement is satisfied.
We further assume that any document containing an object also contains all of its child objects.
In other words, the fundamental unit of SBOL documents is the \sbol{TopLevel} object, and any document containing a \sbol{TopLevel} also contains the complete set of information necessary to describe that \sbol{TopLevel}---but not necessarily any other \sbol{TopLevel} objects that it refers to.
For example, a document containing a \sbol{Component} describing a plasmid is guaranteed to contain every \sbol{Feature} of the plasmid as well as every \sbol{Interaction} and \sbol{Constraint} that relates those features, but the document might not contain the \sbol{Sequence} for the plasmid or the definitions for the \sbol{Component} objects linked from its \sbol{SubComponent} parts.
An SBOL document thus cleaves naturally along the boundaries of \sbol{TopLevel} objects, implying the following set of rules of fragmentation and composition of documents:
\begin{itemize}
\item Any subset of \sbol{TopLevel} objects in a valid SBOL document is also a valid SBOL document.
\item Any disjoint set of \sbol{TopLevel} objects from different SBOL documents MAY be composed to form a new SBOL document. The result is not guaranteed to be valid, however, since the composition may expose problems due to the relationships between \sbol{TopLevel} objects from different documents.
\item If two \sbol{TopLevel} objects in different SBOL documents have the same identity and both they and their child objects contain equivalent sets of property assertions, then they MAY be treated as identical and freely merged.
\item If two \sbol{TopLevel} objects in different SBOL documents have the same identity but different property values, then they MUST be considered different (possibly conflicting) versions, and any merger managed through some version control process.
\end{itemize}