Nigel's TTML and SMIL walkthrough

In this page I'll attempt to walk through some key questions about TTML time and how the answers are defined in TTML2 and SMIL 3 to attempt to create some logical flow that can be reviewed/critiqued/fixed etc. as needed. I may not be right first time! In fact, since I'm attempting to make it as simple as I can, I'm almost certain to be wrong...

TTML constraints on SMIL

TTML syntactically constrains the permitted options to generate a semantic subset of SMIL, which in that sense make it simpler to understand some concepts:

TTML does not permit repeating elements except on animations and sets - repeatDur and repeatCount are not permitted on the content elements defined in timeContainer.
TTML does not permit min and max to be specified.
There is in general (except when ttp:markerMode="discontinuous") enough information in a TTML document to resolve the simple duration, since external events do not trigger duration ends. Do we ever want to allow references to xlinked media end events?

In SMIL there are two kinds of duration, the simple duration and active duration, both of which are actually the periods between their begin and their end. Paraphrasing dangerously, the simple duration is the duration as defined by the begin and dur attributes on the element and the active duration is the duration that the element is active for, defined by the resolved begin and end times, taking into account the ancestor's active duration.

This means that the SMIL concepts of simple durations and active durations, while still being useful, are somewhat less differentiated in TTML. For example, in SMIL a single element with one simple duration can be repeated many times and have multiple active durations. In TTML an element cannot be repeated so this does not apply. However the active end time of an element can be cut off by the simple duration of its parent (or other ancestor), and conversely an unresolved simple duration of an element can be resolved by waiting for all of its children to end.

How does SMIL talk about document begin and end times and "root temporal extent" etc?

SMIL defines:

document begin:

The start of the interval in which the document is presented is referred to as the document begin.

document end

The end of the interval in which the document is presented is referred to as the document end.

document duration

The difference between the end and the begin is referred to as the document duration.

These concepts have no clear equivalent in TTML - one view is that this is what Root Temporal Extent encapsulates; another is that Root Temporal Extent defines the coordinate space for time expressions, i.e. the zero time coordinate relative to which all times are defined. This part of SMIL uses the expression "parent simple duration" which may be coincident with this coordinate space, i.e. the beginning of the parent simple duration is equal to time zero on the TTML timeline. (Note this may be a bit different for clock times vs media times - to be checked)

[GA] Start Comments

It is useful to consider the text under Converting wall-clock values in SMIL 3 §5.4.5:

When the document begins, the current wall-clock time is noted and saved as twallclock-begin. To convert a wall-clock value twc to an element active simple time ts, first convert twc to a document global time tra (i.e. an element active time for the root time container):

tra = twc - twallclock-begin

This may yield a negative time if the wallclock value is a time before the document began. Nevertheless, this is a legal value.

The time tra is then converted normally to element active time or element local time as needed.

This text, especially the text about negative time, makes it clear that the document begin time is always zero (0), and that this zero coordinate is the origin of the document global time[line].

[NM] comment on GA's comment begin

I'm struggling to follow this argument - the text is saying that the offset between some coordinate at the beginning of the document and the wallclock time is negative, but it doesn't define "document begin time" - document begin is defined above and relates to the start of presentation. In your terminology below I take that to be the "presentation artifact begin" not the origin of the time coordinates in the document, i.e. 0.

[GA] more comments

I believe that presentation in this context is not intended to be restricted to the time when artifacts actually appear; but, rather, the time period when artifacts may appear. So, for example, body in SMIL3 may have begin="10h". Now this 10h has to be resolved with respect to a parent time container, but the only candidate for that is the document (global) time(line). Further, it might have been begin="0" or just not specified, in which case begin would default to 0. And this 0 is in what time line? The document timeline. And the document timeline doesn't coincide with the document body's timeline: they represent two distinct timelines (syncbases).

You seem to be thinking that document begin corresponds to body begin, which would be the first opportunity for a presentation artifact to appear; however, I am saying that document begin is the zero point of the document time line, which is the implicit time container for body. In other words, the root element operates like an implicit par time container with at most one timed body child.

[NM] comment on GA's more comments

Actually I didn't suppose that document begin corresponds to body begin (or perhaps more precisely region begin); conversely I also did not consider that the tt element itself is a time container. Also I agree that the origin coordinate or the unstated begin is 0 - that's clear also in my text on how to calculate begin and end times below, which so far appears to be uncontentious, thankfully. "Document syncbase" is not a defined term - synthetic document syncbase is, though it does not clarify what the document is: it is new to me in this discussion that in this case "document" corresponds to the tt element. It is possibly a moot point though whether the syncbase for the top level element that defines times is an external context provided one or one defined by the root element, but still relative to a further syncbase provided externally.

If we want to make the root element an implicit 'par' time container it might help to state it - I think it may help, or it may just move the problem upstream.

The other point here is that the document processing context does not only provide a syncbase that relates coordinate 0 to some real time (that may be in the past), and an effective playRate, but also in general provides a time coordinate at which presentation should commence - this may be an implicit 'now' or an explicit point, e.g. on a media timeline. It can by definition be mapped into the document timeline.

Correct me if I'm wrong but I think the only place where origin vs content begin (perhaps aka document begin) really makes a practical difference is in the calculation of dropped frames, where it's vital to know which frame coordinates are dropped and which are not. Are there any other places where it makes a difference?

[NM further comments 2016-08-15] I've just added a note to #76: unfortunately there are two other distinct concepts that we need to consider, which are 1. the playback entry time may not be zero, and may be defined arbitrarily by the processing context and 2. there is at least one use case for deferring resolution of the document begin time until processing begins. [NM end of further comments]

[NM] end of comment on GA's more comments

I would also note the following text under SMIL3 §5.5.1:

A typical example for "presenting a document" is displaying it on a screen. Possible definitions for the document begin are that the document begins when the complete document has been received by a client over a network, or that the document begins when certain document parts have been received. A typical example of the document end is when the associated application exits or switches context to another document.

The first sentence doesn't mean that something must appear on the screen at the beginning of the presentation, only that the presentation (whether anything is rendered or not) has started and is ongoing; in other words, its clock (timeline) is running (active). Also, it is clear that here, document begin and end are referring to points in an outer (external) timeline, i.e., the document processing context's timeline.

It is instructive to review the relevant changes between TTML1 1st LCWD and TTML1 2nd LCWD.

The former (LCWD1) had:

If begin and (or) end attributes are specified on the tt element, then they specify the beginning and (or) ending points of a time interval for a document instance in relationship with some external application or presentation context. The temporal begin and end points determined by an external application or presentation are referred to subsequently as the external time interval.

Note:

For example, if a document instance represents a timed text media object to be presented in synchronization with a video media object, then the begin and end attributes may be used by an author to express the time into the external video media object's time line at which the timed text is intended to become active.

If the dur attribute is specified on the tt element, then it specifies the temporal duration of the active time interval for a document instance.

While the latter (LCWD2) was rewritten to:

The temporal beginning and ending of a document instance represented by a tt element is defined in relationship with some external application or presentation context. The temporal interval defined by these points is referred to subsequently as the external time interval.

A document instance has an implicit duration that is equal to the implicit duration of the body element of the document, if present, or zero, if not present.

Note also the following excerpts from the description of changes between these two last call working drafts in TTML1 Change Summary:

Remove begin, dur, and end attributes and descriptions thereof from <tt/> and <layout/>;

Change description of implied begin/end on <tt/> to refer to external context temporal interval;

Change description of implied begin/end on <body/> to refer to external time interval;

These excerpts from earlier states of TTML1 give us a better context to consider the questions raised recently. Reviewing this earlier text and the current text, it is my contention that:

let TL be a timeline (time coordinate space), then ORIGIN(TL(*)) is ZERO with respect to its own TL, but may be any real value with respect to its immediate containing (parent) TL; [NM: +1]
there is a document external timeline TL(EXT), defined by the document processing context; [NM: +1]
there is a document internal timeline TL(INT); [NM: +1]
TL(INT) is contained in TL(EXT) on the half-open interval [ORIGIN(TL(EXT)) + BEGIN(DOC), ORIGIN(TL(EXT)) + END(DOC)), which can be simplified to [BEGIN(DOC), END(DOC)), where {BEGIN,END}(DOC) are expressed in TL(EXT); [NM: there are two variables missing here - the document processing context also provides an implicit or explicit begin coordinate and end coordinate. This would be the equivalent of 'play document X from real time 15:00 beginning at document time 00:11:00 ending at document time 00:22:00' rather than what we have here so far, which is 'play document X from real time 15:00, implying 15:00 is document time 0'. ]
{BEGIN,END}(DOC) are effectively determined by the document processing context, a process which may take into account the implicit duration of the document (see more below), or may make use of document external parameters to determine the actual active duration of the document; [NM: +1 - this has added in the extra variable of a truncated end time that I mentioned above]
the tt element's implicit timeline is coincident with TL(INT), i.e., TL(TT) := TT(INT); [NM: okay, we can assert that]
the body element's timeline TL(BODY) is contained in TL(INT) on the half-open interval [ORIGIN(TL(INT)) + BEGIN(BODY), ORIGIN(TL(INT)) + END(BODY)), which can be simplified to [BEGIN(BODY), END(BODY)), where {BEGIN,END}(BODY) are expressed in TL(INT); [NM: +1, assuming this is before any processing context has intervened with a 'seek' point]
the phrase root temporal extent is synonymous with TL(TT) (the timeline of the root tt element) which is synonymous with TL(INT), in other words, it is exactly as presently defined in the current TTML2 glossary:

The temporal extent (interval) defined by the temporal beginning and ending of a document instance in relationship with some external application or presentation context.

[NM: Not "exactly" as defined - yes, it means the temporal beginning and ending of a document instance relative to the presentation context's begin time, but it omits any 'seek point' into the document that the presentation context defines, and any truncation of the end time applied by the presentation context. I'm happy to define the root temporal extent as only what can be computed directly from the document, but we should add some words to clarify that.]

in SMIL terminology, TL(INT) is identical to global time or document time, and root temporal extent is the interval between document begin and document end, i.e., is the same as document duration.

Now, given the above, it is clear that TL(TT) (:=TL(INT)) is a superset of TT(BODY), which is a superset of the interval between the appearance of the first presentation artifact and the last presentation artifact, what I refer to below as the {artifact} segment. On the other hand TL(TT) is what I refer to below as the set {pre-artifact, artifact, post-artifact}.

In reviewing the current TTML2 text, the main problem language appears under §8.1.1:

The root temporal extent, i.e., the time interval over which a document instance is active, has an implicit duration that is equal to the implicit duration of the body element of the document, if the body element is present, or zero, if the body element is absent.

Here, the problem (as I perceive it) is that the way this is written may be read by some to mean that the root temporal extent, i.e., the document duration, is equal to its implicit duration as defined here (i.e., as the implicit duration of a body element). However, this reading is overly restrictive. The document duration is an active duration, and an implicit duration is merely one parameter in the determination of an active duration (see SMIL timing semantics). In fact, an active duration may be shorter or longer than its implicit duration. The purpose of an implicit duration is to serve as a minimum interval in the absence of other information. Since we are talking about TL(TT) (:=TL(INT)) here, there are other important parameters to consider, namely document begin and document end, which are effectively determined externally to the document by the document processing context.

[NM: Agreed - the use of the word "extent" is suggestive of a size, i.e. in temporal terms a duration. Also, you could argue that the text is incorrect in another way: an absent body logically has no end attribute, and an ommitted end attribute implies "undefined", so I would interpret the root temporal extent of a document without a body as being from the origin onwards with no defined end point. This is much more useful for defining an unlimited period during which there is no presentation, which is a practice used in for example EBU Tech3381, which deliberately uses an empty document to signal no content for the duration of an ISOBMFF sample. Further, the text does not take into account the reparenting of body as a child of region in the ISD generation process - arguably that could be refactored to be a problem with the ISD generation process!]

[GA] end

[NM] end of comment

Now, we should also consider SMIL3 §5.5.1:

The host language designer must define the document begin.

The host language designer must define the document end.

I take the first of these statements means that the host language must define where the document begins with respect to some external time line, and does not mean, e.g., that the host language may redefine document begin to mean some non-zero value (with respect to the document time line). The same logic applies to document end.

We also have in SMIL3 §5.7.1:

Global time is defined relative to the common reference for all elements, the document root. This is sometimes also referred to as document time.

So we know the following hold (in SMIL3 terminology):

document time and global time lines are synonymous (in SMIL3 terminology);
the origin of the document time line is zero;
the origin of the document time line is referred to as document begin;
the document begin is the point where presentation begins (though this does not mean that a presentation artifact appears at that time);
the document end is the point where presentation ends (though this does not mean that a presentation artifact appears immediately prior to that time);

Using these statements, we can divide the document (aka global) timeline into five segments (intervals):

(-∞,db): db: document begin, db = 0
[db,pab): db ≤ pab, pab: presentation artifact begin
[pab,pae): dbb ≤ pab ≤ pae, pae: presentation artifact end
[pae,de): pae ≤ de, de: document end
[de,+∞)

We may refer to these five segments (intervals), respectively, using the following labels:

pre-document
pre-artifact
artifact
post-artifact
post-document

Now, using these labels, document active interval (duration) is clearly the concatenation of the 2nd through 4th segments, namely {pre-artifact, artifact, post-artifact}, while presentation artifact active interval (duration) is only the 3rd of these segments {artifact}.

Given the above, I propose we affirm that the term root temporal extent means document active interval (duration), i.e., {pre-artifact, artifact, post-artifact}, and further, that, in the document time line, this interval has zero as it maximum lower bound in all cases.

We may also wish to define additional normative terminology, such as:

external timeline - a time space defined by the processing environment
external (timeline) [related] media [object] begin - begin time of media on external timeline
external (timeline) document begin - begin time of document on external timeline

We may also wish to consider using temporal coordinate space or time coordinate space instead of timeline.

Given the above, we may assume that, by default, external timeline media begin and external timeline document begin are (or should be) coincident. However, we may further generalize, and remove this assumption by making use of the proposed @ttp:mediaOffset (media offset), which can take any real value (negative, zero, or positive) on the external timeline as a parameter that allows expressing non-coincidence of these begin points, i.e., external document begin = external media begin + media offset (or - media offset depending upon how the latter is defined). Of course, we can only do this if we assume a normative play rate on the external time line, say 1, which means that if it isn't 1, then the processing context would have to perform appropriate scaling on this offset.

[GA] End Comments

Which elements have timing?

Any element that is in the set of timeContainers (region, body, div, p, span and br) can have any of the begin, end and dur timing attributes.

Those are the timed elements.

[GA] This isn't quite correct. Any element that allows @begin "has timing". In particular, an element is not required to be a time container to "have timing". For example, in TTML1, the set element has timing, but is not a time container. In a wider sense, every element, even those that do not allow @begin, has timing to the extent that timing is defined by context.

[NM] I agree, Glenn, I haven't considered the set and animate elements yet. I've also assumed that elements that are not permitted to have timing attributes are co-timed with their parent elements. I'm not sure if that is explicitly stated anywhere.

When we have a continuous time coordinate system (`ttp:markerMode="continuous"`)

SMIL3 has this to say, informatively: evaluation of begin and end time lists. TTML should simplify this given that some options like multiple begin or end times, and repeating elements are not possible.

There is a semantic distinction between time expressions that are offset times and those that are wallclock times, which in TTML is dependent on the ttp:timeBase. In "media" time base, all times are offset times relative to 0, so the beginning of the presentation is by default at media time 0. All time expressions do resolve to a value that is a positive real number of seconds. However the interpretation of a time expression in a SMIL context may be as an offset time or a wallclock time. In "clock" time base, the outermost specified times are wallclock times. If ancestors without a specified time exist, they by default begin at '0' which is by definition earlier than or equal to any wallclock time, so this presents no conflict.

How is the begin time of a timed element calculated?

In TTML terms:

The time of an element is relative to its parent timeContainer, if one exists. There are two kinds of timeContainer, a par and a seq.

Children of a par:

begin by default at '0' (in the parent time container's local timeline)
and all times are relative to (i.e. added to) the begin of the parent element.
Multiple children of a par can be active simultaneously;
each child's begin time is relative to the parent's begin.

Children of a seq:

begin by default at the end of the previous (temporally preceding sibling) element.
Any begin attribute acts as an offset relative to the calculated end of the previous element.
The first child of a seq uses the parent element's begin as the "syncbase", i.e. its begin time is relative to its parent.
Only one child of a seq is permitted to be active at any time.
Children of seq elements must have a non-negative offset time expression for the begin, and cannot be event based markers.

The default begin for the document is not explicitly stated in TTML. However only two possibilities exist, and they both effectively resolve to the same thing:

ttp:timeBase="clock" (regardless of ttp:clockMode). The outermost specified times are wallclock times. If ancestors without a specified time exist, they by default begin at '0' which is by definition earlier than or equal to any wallclock time.
ttp:timeBase="media|smpte" (noting that we are assuming here that ttp:markerMode="continuous"). The outermost specified times are relative to the media time; Media that has no smpte time indicators begins at time 0; media with smpte time indicators is implicitly positioned on the smpte timeline (i.e. there's a coordinate space that both the media and the TTML both relate to). So a TTML body element that begins at "10:00:00:00" begins 10 hours in; the related media may also begin 10 hours in: in that case they are synchronised. Likewise in media time, there may be other related media that are playing on the same implied media timeline; this in turn may be provided by some wrapper format such as ISOBMFF. It is important that any such wrapper defines whether it is creating an implied timeContainer with an offset time on the media timeline or whether the implied parent is simply the single media timeline. If the former, TTML document instance times should be relative to the wrapper 'sample' (in ISOBMFF terminology); if the latter, they should be relative to the media timeline, sometimes known as the 'track'.

Note that the resolution of a TTML document into ISDs places region above body in the hierarchy, so the outermost element on which it is possible to place a begin attribute is region: it is possible for multiple regions to define content that is displayed simultaneously. Therefore implicitly the unstated parent time container must have something like par timeContainer semantics.

Time containers that only contain other time containers may begin before any content begins. For example:

<div begin="100s" ...>
   <p begin="3s" ...>Some content</p>
   <p begin="10s" ...>Some other content</p>
</div>

In this fragment, the div begins at 100s (relative to its parent's begin time) but no content is displayed until 3s after that when "Some content" is displayed.

To answer the question about how this example is resolved:

<body dur="6s">
<div end="10s">
...
</div>
</body>

In this case the body begins at the default begin, 0, and lasts 6s. The div also begins at the default begin, 0, and has a simple duration of 10s but an active duration (italicised SMIL terms) of 6s because it is cut off by the body's dur attribute. There is nothing in TTML or SMIL that suggests that the begin of the div is somehow set back 10s relative to the end of the body; undefined begin times are resolved as their default value. Explicit negative begin times are syntactically prohibited in TTML.

Note that in this specific example there's no difference in behaviour between par and seq because we can only see one child element of the body.

How is the end time of an element calculated?

These things affect the end time of an element:

whether the parent element is a par or a seq time container
the begin of the parent element
the resolved begin time
the end time (may be implicit or explicit)
the end time of the parent (may be implicit or explicit)
the duration of the element (may be implicit or explicit)
the dur of the parent (may be implicit or explicit)

The implicit duration of a par with endsync="last" (which TTML specifies) ends at the last active end of the child elements.

seq containers may define an end with reference to an event marker or be indefinite (achieved in TTML by not specifying an end attribute). The implicit duration of a seq ends with the active end of the last child of the seq. If any child of a seq has an indefinite active duration, the implicit duration of the seq is also indefinite.

This means that in TTML both permitted kinds of time container have the same end calculation semantics, helpfully, with the subtle exception that explicit end times of children of seq containers are relative to the end of the previous sibling (or the begin of the parent for the first one), whereas explicit end times of children of par containers are relative to the begin of the parent.

The dur attribute specifies the simple duration; the end attribute controls the active duration. In practice in TTML this means that whichever combination of 'begin+dur' and 'end' is earlier defines the end time. However a simple duration of a parent will constrain the active duration of its children, so for example:

<body dur=10s>
   <div end=15s>
...

results in the active duration of the div being cut off at 10s by the simple duration of the body.

An absent dur attribute implies 'indefinite', so the simple duration is unconstrained. Similarly an absent end attribute implies no end to the active duration. It is therefore possible to create an element that is displayed 'forever'.

In TTML neither lists of times nor negative offsets nor complex event markers are permitted values for end attributes.

Given a resolved begin time for an element and a parent end that is 'undefined':

if the dur is specified but the end is not, the end is resolved begin + dur, regardless of any of the child elements.
if the end is specified but the dur is not, the end is end relative to the same syncbase as the begin (see above), regardless of any of the child elements.
if the end and the dur are specified the end is the earlier of resolved begin + dur and end relative to the same syncbase as the begin, regardless of the child elements.
if neither the end nor the dur are specified then the end is 'whenever all the children have ended'.

If the parent end is defined and is earlier than the end time resolved as above, then the resolved end is the parent end. If the parent end is defined and is later than or equal to the end time resolved as above, then the resolved end is as above.

Examples:

<div begin="100s" end="120s">
   <p begin="10s" end="15s">Some content</p>
</div>
...

Here, the p begins at 110s and ends at 115s. The div begins at 100s and ends at 120s.

<div begin="100s" end="120s">
   <p begin="10s" end="25s">Some content</p>
</div>
...

Here, the p begins at 110s and would end at 125s but is cut off by the div, so ends at 120s. The div begins at 100s and ends at 120s.

<div begin="100s">
   <p begin="10s" end="15s">Some content</p>
</div>
...

Here, the p begins at 110s and ends at 115s. The div begins at 100s and has an indefinite end - its active duration ends at 115s, which is when its children have all ended.

<div begin="100s" end="120s" dur="30s">
   <p begin="10s" end="25s">Some content</p>
</div>
...

Here, the p begins at 110s and would end at 125s but is cut off by the parent div's active duration at 120s. The div begins at 100s and ends at 120s. The div's simple duration of 30s is overridden by the earlier end implied by its end attribute.

How do media marker based timings work when `ttp:markerMode="discontinuous"`?

Media markers are a bit like events; in other words they don't refer to an elapsed time since some syncbase, but to a media marker being observed. If there is no source of media markers against which to compare, it is effectively impossible to present the document.

To see how this works we have to look at the element lifecycle. An element starts up as follows:

An element life cycle begins with the beginning of the simple duration for the element's parent time container.

This means that no element can begin until its parent has begun. This also means that the set of events that an implementation needs to watch for is restricted to the begin markers of all the children of the currently active elements.

However, what happens if an element has no begin attribute? In this case the default timing depends on the container:

Children of a par container begin when the par container begins.
Children of a seq container begin when the previous child ends. The first child of a seq container begins when its parent begins. Problem The default begin value for a child of a seq is '0'and media-marker-value begin values are not permitted. Therefore in TTML we should make clear that markerMode="discontinuous" and timeContainer="seq" is not a permitted combination. There's an Editorial Note in timeContainer for this already.

Therefore, in a recursive top down algorithm:

each element begins at its begin marker value if present, or at the beginning of the parent element if not (with the top level element beginning at the beginning of the presentation).
each element that has begun ends at its end marker value if present.
if an element ends while any of its children are active then those children also end.
an element with no end marker value ends when all its children have ended.

In TTML repeating elements are not permitted; nor are multiple markers in a list. dur attributes are explicitly prohibited when ttp:markerMode="discontinuous" and ttp:timeBase="smpte".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nigel's TTML and SMIL walkthrough

TTML constraints on SMIL

How does SMIL talk about document begin and end times and "root temporal extent" etc?

Which elements have timing?

When we have a continuous time coordinate system (`ttp:markerMode="continuous"`)

How is the begin time of a timed element calculated?

How is the end time of an element calculated?

How do media marker based timings work when `ttp:markerMode="discontinuous"`?

Clone this wiki locally

Nigel's TTML and SMIL walkthrough

TTML constraints on SMIL

How does SMIL talk about document begin and end times and "root temporal extent" etc?

Which elements have timing?

When we have a continuous time coordinate system (ttp:markerMode="continuous")

How is the begin time of a timed element calculated?

How is the end time of an element calculated?

How do media marker based timings work when ttp:markerMode="discontinuous"?

Clone this wiki locally

When we have a continuous time coordinate system (`ttp:markerMode="continuous"`)

How do media marker based timings work when `ttp:markerMode="discontinuous"`?