Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a recommendation to including a TZ offset in time units. #584

Open
ChrisBarker-NOAA opened this issue Jan 8, 2025 · 24 comments · May be fixed by #586
Open

Add a recommendation to including a TZ offset in time units. #584

ChrisBarker-NOAA opened this issue Jan 8, 2025 · 24 comments · May be fixed by #586
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@ChrisBarker-NOAA
Copy link
Contributor

ChrisBarker-NOAA commented Jan 8, 2025

Title

Add a recommendation to including a TZ offset in time units

Moderator

TBA (if needed)

Moderator Status Review [last updated: YYYY-MM-DD]

TBA

Requirement Summary

The current doc, for time coordinates says:

"""
The reference datetime string (appearing after the identifier since) is required. It may include date alone, or date and time, or date, time and time zone offset. Its format is y-m-d [H:M:S [Z]], where […​] indicates an optional element,
"""

So the timezone offset (Z) is optional.

and:

"""
The default time zone offset is zero.
"""

So leaving out the offset means that the timestamp is the prime meridian time.

It's far too late to change that default, but it is unfortunate that there is no way to express "naive" datetime, or "localtime", etc.

Granted, it's a bad practice to do so -- presumably data providers should know what offset their data are in, and let us know. But it's also, I think, bad practice to simply leave it off.

Note that according to Wikipedia, ISO 8601 specifies that:

"If no UTC relation information is given with a time representation, the time is assumed to be in local time"

Rather than UTC or meridian zero, or ...

So it's not unrealistic to think someone out there might expect that to be the case for CF. (or frankly, they simply aren't thinking about it).

Technical Proposal Summary

Proposal 1:

I propose that we add language to the effect of:

"""
The default time zone offset is zero, but it is recommended that an offset always be provided -- "+0" can be specified for reference datetime strings at the prime meridian.
"""

Proposal 2:

ISO 8601 allows "Z" to be used to mean 0 offset. In fact, according to wikipedia, it recommends it.

But it seems CF does not currently allow the "Z"

But I'm pretty sure I've seen otherwise conforming files use the "Z", and all the software I've tested accepts it.

So I think we should allow it in CF.

Conformance question:

not specific to this proposal, but is there a standard for flagging non-recommended practices in conformance checkers?

I thought there was a "conformance document" somewhere, but I can't find it -- am I imagining things? -- found it: https://github.com/cf-convention/Conformance/blob/master/conformance.adoc
Nevermind.

Benefits

Hopefully, future datasets will be a tiny bit less ambiguous in the future.

Status Quo

Status Quo is that a number of datasets in the wild don't have a TZ offset explicitly -- not a killer, but hopefully there might be fewer in the future if this is added.

Associated pull request

I'll do a PR, if folks think this is a good idea.

@ChrisBarker-NOAA ChrisBarker-NOAA added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Jan 8, 2025
@JonathanGregory
Copy link
Contributor

Dear Chris

These are interesting points. Thanks.

Proposal 1

If you have a gridded field (which is the kind of data CF was invented for and is very commonly used for) it doesn't really make sense to use a non-zero timezone offset anyway, because the same time-coordinate applies everywhere in the spatial field, often the whole world. I don't think a recommendation to include a timezone offset is useful in that case.

Is this a concern more for data in discrete sampling geometries, such as station timeseries? If so, perhaps we could make the recommendation for that kind of data only?

Proposal 2

CF uses UDUNITS for its units syntax, as you know, and I find that UDUNITS supports Z to mean zero timezone offset. In fact, UDUNITS is more accepting than I wrote when drafting the new text in Sect 4.4.1. It allows you to specify the timezone offset or Z even when you omit the time. We could change the text to say

Its format is y-m-d [H:M:S] [Z], where [...] indicates an optional element

as well as allowing Z to be Z, if people think it's useful. You can put seconds since 1972-1-1Z, for instance.

Cheers

Jonathan

@ChrisBarker-NOAA
Copy link
Contributor Author

If you have a gridded field (which is the kind of data CF was invented for and is very commonly used for) it doesn't really make sense to use a non-zero timezone offset anyway, because the same time-coordinate applies everywhere in the spatial field, often the whole world. I don't think a recommendation to include a timezone offset is useful in that case.

well, there are smaller scale region models that are all in one timezone, and, as you point out, point data is sometimes stored in some version of local time.

but whether you want to include a non-xzero offset or not -- it's still better for us to be consistent, and having "0" or "Z" there is better than leaving it blank and hoping for teh best.

In any case, there's never a reason not to do it, hence why I htink we should suggest that it always be done.

as well as allowing Z to be Z, if people think it's useful. You can put seconds since 1972-1-1Z, for instance.

Lets do that -- I'll draft a PR soon, unless someone beats me to it :-)

@taylor13
Copy link

Suggesting, I suppose, is o.k., but I'm pretty sure nearly everyone that's time-stamping 0Z will simply omit it. I know I would.

@JonathanGregory
Copy link
Contributor

I don't think it's worthwhile recommending it for gridded fields. Do you know examples of gridded fields (from models or obs), contained all within one timezone, for there is a risk or actual cases where the data-writer intended local time but has forgotten to record the timezone offset?

I do think it's more likely to be forgotten for discrete sampling geometries containing observed data, but I don't know how likely, because I lack experience of these data. Do you work with such data? Does anyone else have experience of whether people forget to record timezones - Luke @lhmarsden, for instance?

Suggesting is harmless, but recommending has the cost of possibly annoying and useless warnings being produced by the CF checker.

@ChrisBarker-NOAA
Copy link
Contributor Author

I don't think it's worthwhile recommending it for gridded fields

well, I'm not so sure I agree, but the way CF is is written, a time coordinate is a time coordinate, so we can only make one recommendation.

Does anyone else have experience of whether people forget to record timezones

I certainly have -- though I can't say for sure whether they files in question claimed to be CF compliant.

But anyway, there's nothing we can do about that now -- CF has stated (forever?) that no offset provided means time at prime meridian (what most people call UTC or GMT, but I know why we're not using those terms).

All I'm suggesting is that it's better to put:

2025-01-10T12:12:30Z

than

2025-01-10T12:12:30

Because then there is no chance whatsoever for a misunderstanding.

explicit is better than implicit, and all that.

NOTE: This came up for me because I'm working on some software where we are trying to make it more timezone smart -- and my team was uncomfortable with using UTC when an offset is not specified -- even though it's the CF standard. I think we've been bitten far to often with not-quite compliant files, even if not for this particular reason.

So I thought -- "wouldn't it be better if people simply put the Z (or zero) on there?"

Side note: There was also a painful misfeature in the initial numpy datetime implementation -- following standards, it interpreted no offset as UTC, and then applied the offset that the computer is was running on to make it UTC -- that was a really ugly mess! I know this is not the same thing, but it makes me wary -- again, I really prefer being explicit!

@taylor13 wrote:
" I'm pretty sure nearly everyone that's time-stamping 0Z will simply omit it. I know I would."

Why is that (for you anyway)? It seems simple and clear to me is that much of a burden?

Anyway, off to write a PR -- at least for the "Z" part, still not clear there's any consensus on the recommendation.

@ChrisBarker-NOAA ChrisBarker-NOAA linked a pull request Jan 10, 2025 that will close this issue
4 tasks
@lhmarsden
Copy link

lhmarsden commented Jan 10, 2025 via email

@taylor13
Copy link

@taylor13 wrote:

" I'm pretty sure nearly everyone that's time-stamping 0Z will simply omit it. I know I would."

Why is that (for you anyway)? It seems simple and clear to me is that much of a burden?

I expect lots of users to want to display on their graphs the time and units and it's less cluttered (easier to read) if there is no trailing "Z" or "+0". Any global model output I've ever seen has indicated time at 0Z so the suffix is unnecessary in my community.

For the same reason I would leave off the time entirely when the time is invariably 0:0:0.

@ChrisBarker-NOAA
Copy link
Contributor Author

As long as there are no constraints preventing people from forgetting the time zone, some people will forget the time zone. I have seen this rarely, but it happens.

Indeed it does. In fact, a lot of software has no concept of timezone (Python calls it "naive time") -- so it's all too easy for anyone using "local time" to not have an offset in output.

But it's way too late to require a TZ offset -- so here we are.

"I expect lots of users to want to display on their graphs the time and units and it's less cluttered (easier to read) "

sure -- though this is for the Time coord units -- I don't think that's what would end up on anyone's graphics anyway.

Does anyone put "hours since 2025-01-10T12:12:30" on a graph ??

Anyway, I won't die on this hill -- if anyone wants to veto adding a recommendation, I'll forget it.

Comment here or on the PR: #586

@JonathanGregory
Copy link
Contributor

Dear Chris

Thanks for the PR.

As I remarked before, we should also change the description of the format that occurs earlier in 4.4.1, to read

The reference datetime string (appearing after the identifier since) is required. It must include the date, which may optionally be followed by time or time zone offset or both. Its format is y-m-d [H:M:S] [Z], where [...] indicates an optional element, ...

In the definition of Z, I think we should delete the phrase "with respect to UTC", because that's only true in the real world! There's no need for this phrase, because it's explained just below what zero offset means, without referring to UTC.

I think it might be clearer to add Z as another possibility in the bulleted list of formats for the time zone offset Z, because then you wouldn't have to mention it both before and after the list:

The time zone offset Z must be in one of the following formats, where any of the numeric formats may be prefixed with a sign:

  • The letter Z indicates a zero offset, sometimes referred to as "Zulu time"; the space between the Z and the time or date may be omitted.
  • H, the hour alone, ....

UDUNITS allows the space between time and time zone offset to be omitted if the latter begins with + or -, as well as before Z. Do you think we should allow that too?

After the list, you remark that the default is zero. This isn't needed, because it's already been said, just after Z is defined.

As you know, I disagree with recommending that the time zone should be included, which you've included in the PR. If we say it's a recommendation, we have to include it in the conformance document, and a CF checker will report a warning every time it's absent. That would occur with all time coordinates in CMIP files, and with all the examples in the CF document, for instance. I appreciate your point of view, but as you correctly say, that decision was made long ago. It was natural, because (a) CF started with climate and forecast data, which are usually global gridded fields and always use UTC or model equivalent, (b) zero time zone offset is the default of UDUNITS, and CF follows COARDS in using UDUNITS syntax for units.

You have: days since 2013-1-13 1800
You want: days since 2013-1-13 1800Z  
    1 days since 2013-1-13 1800 = 1 (days since 2013-1-13 1800Z)
    x/(days since 2013-1-13 1800Z) = (x/(days since 2013-1-13 1800))
You have: hours since 2013-1-13 1800
You want: hours since 2013-1-13 1800 -5
    1 hours since 2013-1-13 1800 = -4 (hours since 2013-1-13 1800 -5)
    x/(hours since 2013-1-13 1800 -5) = (x/(hours since 2013-1-13 1800)) - 5

I think it would be OK to include something weaker than a recommendation at the end of the section, such as "We suggest that the time zone offset be explicitly specified in any situation where omitting it might be misunderstood as indicating local time."

Best wishes

Jonathan

@ChrisBarker-NOAA
Copy link
Contributor Author

In the definition of Z, I think we should delete the phrase "with respect to UTC"

Done

I think it might be clearer to add Z as another possibility in the bulleted list of formats for the time zone offset Z, because then you wouldn't have to mention it both before and after the list:

yeah, I was trying to figure out how best to do that -- 'cause 'Z' is not "value in one of the following four formats"

but it kinda is -- let's try:

"""
The time zone offset Z must be in one of the following five formats, any of which may be prefixed with a sign:

** The letter Z indicating zero offset, sometimes referred to as "Zulu Time".

** H, the hour alone, of one or two digits e.g. -6, 2, +11, which is sufficient for many time zones.

** H:M, where H is hour and M minute, each of one or two digits, e.g. 5:30.

** four digits, of which the first pair are the hours and the second the minutes e.g. 0530.

** three digits, of which the first is the hour (0--9) e.g. 530.

"""

Will folks think that the 'Z' can be prefixed with a sign? I hope not.

UDUNITS allows the space between time and time zone offset to be omitted if the latter begins with + or -, as well as before Z. Do you think we should allow that too?

Yes, we should follow UDUNITS unless there's good reason not to -- however, do we have to lay all that out here? elsewhere we simple say "follow UDUNITS".

UDUNITS sure looks a lot like ISO 8601 -- are there differences? In practice, I'll bet most folks are using IOS 8601 format -- so I hope it's not too different. But if it is the same, we could say so -- it's a lot easier to find docs for. ISO 8601.

... a CF checker will report a warning every time it's absent.

That was the goal, yes.

...That would occur with all time coordinates in CMIP files, and with all the examples in the CF document, for instance.

Fair enough, it would't "break" anything, but would add a lot of noise, so I'll retract the idea.

Though I think it would be good to update at least some of the examples in the CF with a "Z".

Many folks tend to learn by following examples, rather than reading the docs -- so best practices should be used in the docs as much as possible. I may make a few updates in the PR, but haven't yet.

I think it would be OK to include something weaker than a recommendation at the end of the section, such as "We suggest that the time zone offset be explicitly specified in any situation where omitting it might be misunderstood as indicating local time."

OK -- Done in the PR, with this language:

"While the default (unspecified) is an offset of zero, we suggest that a zero offset be specified to avoid any confusion where omitting it might be misunderstood as indicating local time."

I like explicitly saying we know zero is the default, but it's still good to specify it.

But more word smithing is welcome.

@JonathanGregory
Copy link
Contributor

Dear Chris

Thanks for being willing to compromise about the recommendation. I'm happy with your text:

While the default (unspecified) is an offset of zero, we suggest that a zero offset be specified to avoid any confusion where omitting it might be misunderstood as indicating local time.

I also agree it would be a good idea to add a numerical timezone offset or Z in a few examples. If I'm correct that people are more likely to use local time for observations at points than they are for gridded datasets, then it would be most helpful to modify some examples of timeseries, profile and trajectory DSGs.

I don't know what ISO 8601 says, since it's hidden behind a rather high paywall in Switzerland.

The UDUNITS syntax is precisely described in its documentation but users may not find this easy to interpret. Hence we can help by describing it simply. Yes, unfortunately it looks like your words imply that Z could be signed, although that's a silly interpretation. I would say:

The time zone offset Z must be in one of the following five formats, where numeric hours may optionally be prefixed with a + or - sign:

and after the list

If the time zone offset is the letter Z or begins with a sign, the space before it may be omitted.

Further up, as I mentioned before, we need:

The reference datetime string (appearing after the identifier since) is required. It must include the date, which may optionally be followed by time or time zone offset or both. Its format is y-m-d [H:M:S] [Z], where [...] indicates an optional element, ...

Actually UDUNITS is even more flexible than that. Its format is y[-m[-d]] [H:[M:[S]]] [Z] i.e. only the year is mandatory, and only the hour is mandatory if you include the time. But I believe that in the examples we only use y-m-d or y-m-d H:M:S, so perhaps we don't need to mention this further flexibility.

Best wishes

Jonathan

@ChrisBarker-NOAA
Copy link
Contributor Author

ChrisBarker-NOAA commented Jan 16, 2025

I don't know what ISO 8601 says, since it's hidden behind a rather high paywall in Switzerland.

Indeed -- I rely on the Wikipedia interpretation :-)

The UDUNITS syntax is precisely described in its documentation but users may not find this easy to interpret.

Indeed, I have had no luck at all figuring out what unit strings are legal from the docs -- I've had to rely on experimenting with the command line utility (which is actually what they suggest in the docs).

I've been meaning to suggest some improvements to the CF docs (or the UDUNITS docs) to address that, but haven't had the time yet.

Example: UDUNITS will except all of "m" "meter" "meters" -- but I haven't seen that documented anywhere :-(.

I think that back in the UDUNITS1 days, there was a Unit database that CF pointed to that made it pretty clear. But it's now all in an XML file that is, to say the least, not very human readable.

So yes -- we need that description in the CF docs -- but it can be a subset of what UDUNITS allows.

I'll make a few more changes to the PR based on your suggestions, and then take the WIP off.

@davidhassell
Copy link
Contributor

davidhassell commented Jan 16, 2025

Edit: Apologies, Chris, I had indeed mis-read what you wrote. All good!

I agree that we should not make an explicit recommendation that an offset should be provided, but also like Jonathan's suggestion of (something like) "We suggest that the time zone offset be explicitly specified in any situation where omitting it might be misunderstood as indicating local time."

Chris - it sounded like you agreed with not making it a recommendation, but then went on to write text in the PR making it a recommendation (both in #584 (comment)). Just checking what your position is (apologies if I've misunderstood :)).

@davidhassell
Copy link
Contributor

On the new line of text:

"While the default (unspecified) is an offset of zero, we suggest that a zero offset be specified to avoid any confusion where omitting it might be misunderstood as indicating local time."

I find "(unspecified)" ambiguous. How about:

"While the default (of omitting the Z component) is an offset of zero, we suggest that a zero offset be specified to avoid any confusion where omitting it might be misunderstood as indicating local time."

@ChrisBarker-NOAA
Copy link
Contributor Author

OK -- updated the PR with Jonathan's and DAvid's suggestions.

I've added "Z" to a couple examples.

Question: we are generally using "0:0:0" to indicate the zero time -- I think I"ve seen "00:00:00" more often, which is more consistent with the two digits for everything format.

Either is legal, but I prefer: "00:00:00".

I started to change that in my PR, but discovered that there are a LOT of occurrences in ch04! So doing it in a PR if I"ll need to roll that back is silly.

What do you al think? make that change or leave it as is?

NOTE: if you have wordsmithing suggestions, it would be easier for me for you to put those in comments in the PR: #586

@JonathanGregory
Copy link
Contributor

Thanks, Chris. Actually #586 looks unchanged to me. I am puzzled by this.

You're right, both 0:0:0 and 00:00:00 are OK. I don't think we should change all existing ones, but it would be fine to change the ones you're adding Z to.

@ChrisBarker-NOAA
Copy link
Contributor Author

Actually #586 looks unchanged to me

oops -- I forgot to push at the end of the day yesterday.

there's a couple changes now.

I didn't end up changing any of the 0:0:0 entries -- let sleeping docs lie.

I think it's ready for review ....

@JonathanGregory
Copy link
Contributor

Thanks, Chris. I have added comments in the PR.

@ChrisBarker-NOAA
Copy link
Contributor Author

I've addressed the comments in the PR -- getting close!

@davidhassell
Copy link
Contributor

Just a presentational question:

I was wondering why we're using lower case designators for years, months, days (ymd) and upper case ones for hours, minutes, seconds (HMS), whereas ISO 8601 is the other way round. Is it because that's how it was done in Chapter 7.4?

We're currently internally consistent in the CF conventions document (good), but should we change to also be externally consistent? Changing to match ISO would involve also trivial changes to ~4 lines in chapter 7.

@JonathanGregory
Copy link
Contributor

David asked,

I was wondering why we're using lower case designators for years, months, days (ymd) and upper case ones for hours, minutes, seconds (HMS), whereas ISO 8601 is the other way round. Is it because that's how it was done in [Chapter 7.4 (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.12/cf-conventions.html#climatological-statistics)?

Yes, when I drafted this part, I followed chapter 7 for consistency. The present convention is consistent with Linux strftime(3), date(1) and Python datetime.datetime.strftime, so it has some pedigree, but I think it would be fine to change both of them to agree with ISO.

Apart from this, if we prefer to change it, I think the PR is fine. Thanks, Chris.

@davidhassell
Copy link
Contributor

I don't really mind about the upper/lower case thing, but I would probably lean on the side if ISO if asked to vote. Does anyone a stronger opinion than me or Jonathan?

Likewise, this formatting question aside, I think the PR is fine.

@ChrisBarker-NOAA
Copy link
Contributor Author

IIUC, UDUNITS is a bit more flexible than ISO 8601. But it's a bit hard to know, because I can't find it clearly documented anywhere.

That's why we are laying it out clearly in the CF doc. But we're not documenting everything that's allowed, and we probably don't need to -- rather, here's something that will work. And in practice, folks follow examples more than they carefully parse out definitions (unless they are writing compliance software...).

And the ISO standard is well known (if not any better understood).

Given that, if UDUNITS is ISO compatible, it would probably be best to use ISO -- compliant explanations and examples.

Hmm -- if, in fact, I am correct and ISO 8601 strings are UDUNITS compatible, maybe we should say that in the doc?

Practical application: ISO formatting software is common, and I expect many folks use it to write CF files (I know I do: python's datetime has a .isoformat method.)

Anyway, I need to put this PR to bed, so I'll leave that to another PR, if someone wants to write it.

@JonathanGregory
Copy link
Contributor

if, in fact, I am correct and ISO 8601 strings are UDUNITS compatible ...

You are correct. Thanks for telling us! Was this true in UDUNITS version 1, I wonder? I for one didn't know it, and none of the examples in the CF document use the ISO timestamp format. However, the UDUNITS documentation says it does, in a symbolic form, and the command line utility agrees:

You have: seconds since 20250121T220713Z
You want: seconds since 2025-1-21 22:07:00
    1 seconds since 20250121T220713Z = 14 (seconds since 2025-1-21 22:07:00)
    x/(seconds since 2025-1-21 22:07:00) = (x/(seconds since 20250121T220713Z)) + 13

I agree that we should not include that in this issue. It's another matter.

@JonathanGregory JonathanGregory linked a pull request Jan 22, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants