-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a recommendation to including a TZ offset in time units. #584
Comments
Dear Chris These are interesting points. Thanks. Proposal 1 If you have a gridded field (which is the kind of data CF was invented for and is very commonly used for) it doesn't really make sense to use a non-zero timezone offset anyway, because the same time-coordinate applies everywhere in the spatial field, often the whole world. I don't think a recommendation to include a timezone offset is useful in that case. Is this a concern more for data in discrete sampling geometries, such as station timeseries? If so, perhaps we could make the recommendation for that kind of data only? Proposal 2 CF uses UDUNITS for its units syntax, as you know, and I find that UDUNITS supports
as well as allowing Z to be Cheers Jonathan |
well, there are smaller scale region models that are all in one timezone, and, as you point out, point data is sometimes stored in some version of local time. but whether you want to include a non-xzero offset or not -- it's still better for us to be consistent, and having "0" or "Z" there is better than leaving it blank and hoping for teh best. In any case, there's never a reason not to do it, hence why I htink we should suggest that it always be done.
Lets do that -- I'll draft a PR soon, unless someone beats me to it :-) |
Suggesting, I suppose, is o.k., but I'm pretty sure nearly everyone that's time-stamping 0Z will simply omit it. I know I would. |
I don't think it's worthwhile recommending it for gridded fields. Do you know examples of gridded fields (from models or obs), contained all within one timezone, for there is a risk or actual cases where the data-writer intended local time but has forgotten to record the timezone offset? I do think it's more likely to be forgotten for discrete sampling geometries containing observed data, but I don't know how likely, because I lack experience of these data. Do you work with such data? Does anyone else have experience of whether people forget to record timezones - Luke @lhmarsden, for instance? Suggesting is harmless, but recommending has the cost of possibly annoying and useless warnings being produced by the CF checker. |
well, I'm not so sure I agree, but the way CF is is written, a time coordinate is a time coordinate, so we can only make one recommendation.
I certainly have -- though I can't say for sure whether they files in question claimed to be CF compliant. But anyway, there's nothing we can do about that now -- CF has stated (forever?) that no offset provided means time at prime meridian (what most people call UTC or GMT, but I know why we're not using those terms). All I'm suggesting is that it's better to put: 2025-01-10T12:12:30Z than 2025-01-10T12:12:30 Because then there is no chance whatsoever for a misunderstanding. explicit is better than implicit, and all that. NOTE: This came up for me because I'm working on some software where we are trying to make it more timezone smart -- and my team was uncomfortable with using UTC when an offset is not specified -- even though it's the CF standard. I think we've been bitten far to often with not-quite compliant files, even if not for this particular reason. So I thought -- "wouldn't it be better if people simply put the Z (or zero) on there?" Side note: There was also a painful misfeature in the initial numpy datetime implementation -- following standards, it interpreted no offset as UTC, and then applied the offset that the computer is was running on to make it UTC -- that was a really ugly mess! I know this is not the same thing, but it makes me wary -- again, I really prefer being explicit! @taylor13 wrote: Why is that (for you anyway)? It seems simple and clear to me is that much of a burden? Anyway, off to write a PR -- at least for the "Z" part, still not clear there's any consensus on the recommendation. |
As long as there are no constraints preventing people from forgetting the time zone, some people will forget the time zone. I have seen this rarely, but it happens.
…________________________________
From: Chris Barker ***@***.***>
Sent: Friday, January 10, 2025 8:24:51 PM
To: cf-convention/cf-conventions ***@***.***>
Cc: Luke Marsden ***@***.***>; Mention ***@***.***>
Subject: Re: [cf-convention/cf-conventions] Add a recommendation to including a TZ offset in time units. (Issue #584)
I don't think it's worthwhile recommending it for gridded fields
well, I'm not so sure I agree, but the way CF is is written, a time coordinate is a time coordinate, so we can only make one recommendation.
Does anyone else have experience of whether people forget to record timezones
I certainly have -- though I can't say for sure whether they files in question claimed to be CF compliant.
But anyway, there's nothing we can do about that now -- CF has stated (forever?) that no offset provided means time at prime meridian (what most people call UTC or GMT, but I know why we're not using those terms).
All I'm suggesting is that it's better to put:
2025-01-10T12:12:30Z
than
2025-01-10T12:12:30
Because then there is no chance whatsoever for a misunderstanding.
explicit is better than implicit, and all that.
NOTE: This came up for me because I'm working on some software where we are trying to make it more timezone smart -- and my team was uncomfortable with using UTC when an offset is not specified -- even though it's the CF standard. I think we've been bitten far to often with not-quite compliant files, even if not for this particular reason.
So I thought -- "wouldn't it be better if people simply put the Z (or zero) on there?"
Side note: There was also a painful misfeature in the initial numpy datetime implementation -- following standards, it interpreted no offset as UTC, and then applied the offset that the computer is was running on to make it UTC -- that was a really ugly mess! I know this is not the same thing, but it makes me wary -- again, I really prefer being explicit!
@taylor13<https://github.com/taylor13> wrote:
" I'm pretty sure nearly everyone that's time-stamping 0Z will simply omit it. I know I would."
Why is that (for you anyway)? It seems simple and clear to me is that much of a burden?
Anyway, off to write a PR -- at least for the "Z" part, still not clear there's any consensus on the recommendation.
—
Reply to this email directly, view it on GitHub<#584 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOMNBFIHQ7PDZJGLFPSKVUT2KAUBHAVCNFSM6AAAAABU2Q3DPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBUGAYTAMBQGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I expect lots of users to want to display on their graphs the time and units and it's less cluttered (easier to read) if there is no trailing "Z" or "+0". Any global model output I've ever seen has indicated time at 0Z so the suffix is unnecessary in my community. For the same reason I would leave off the time entirely when the time is invariably 0:0:0. |
Indeed it does. In fact, a lot of software has no concept of timezone (Python calls it "naive time") -- so it's all too easy for anyone using "local time" to not have an offset in output. But it's way too late to require a TZ offset -- so here we are. "I expect lots of users to want to display on their graphs the time and units and it's less cluttered (easier to read) " sure -- though this is for the Time coord units -- I don't think that's what would end up on anyone's graphics anyway. Does anyone put "hours since 2025-01-10T12:12:30" on a graph ?? Anyway, I won't die on this hill -- if anyone wants to veto adding a recommendation, I'll forget it. Comment here or on the PR: #586 |
Dear Chris Thanks for the PR. As I remarked before, we should also change the description of the format that occurs earlier in 4.4.1, to read
In the definition of Z, I think we should delete the phrase "with respect to UTC", because that's only true in the real world! There's no need for this phrase, because it's explained just below what zero offset means, without referring to UTC. I think it might be clearer to add
UDUNITS allows the space between time and time zone offset to be omitted if the latter begins with After the list, you remark that the default is zero. This isn't needed, because it's already been said, just after Z is defined. As you know, I disagree with recommending that the time zone should be included, which you've included in the PR. If we say it's a recommendation, we have to include it in the conformance document, and a CF checker will report a warning every time it's absent. That would occur with all time coordinates in CMIP files, and with all the examples in the CF document, for instance. I appreciate your point of view, but as you correctly say, that decision was made long ago. It was natural, because (a) CF started with climate and forecast data, which are usually global gridded fields and always use UTC or model equivalent, (b) zero time zone offset is the default of UDUNITS, and CF follows COARDS in using UDUNITS syntax for
I think it would be OK to include something weaker than a recommendation at the end of the section, such as "We suggest that the time zone offset be explicitly specified in any situation where omitting it might be misunderstood as indicating local time." Best wishes Jonathan |
Done
yeah, I was trying to figure out how best to do that -- 'cause 'Z' is not "value in one of the following four formats" but it kinda is -- let's try: """ ** The letter ** H, the hour alone, of one or two digits e.g. ** H:M, where H is hour and M minute, each of one or two digits, e.g. ** four digits, of which the first pair are the hours and the second the minutes e.g. ** three digits, of which the first is the hour (0--9) e.g. """ Will folks think that the 'Z' can be prefixed with a sign? I hope not.
Yes, we should follow UDUNITS unless there's good reason not to -- however, do we have to lay all that out here? elsewhere we simple say "follow UDUNITS". UDUNITS sure looks a lot like ISO 8601 -- are there differences? In practice, I'll bet most folks are using IOS 8601 format -- so I hope it's not too different. But if it is the same, we could say so -- it's a lot easier to find docs for. ISO 8601.
That was the goal, yes.
Fair enough, it would't "break" anything, but would add a lot of noise, so I'll retract the idea. Though I think it would be good to update at least some of the examples in the CF with a "Z". Many folks tend to learn by following examples, rather than reading the docs -- so best practices should be used in the docs as much as possible. I may make a few updates in the PR, but haven't yet.
OK -- Done in the PR, with this language: "While the default (unspecified) is an offset of zero, we suggest that a zero offset be specified to avoid any confusion where omitting it might be misunderstood as indicating local time." I like explicitly saying we know zero is the default, but it's still good to specify it. But more word smithing is welcome. |
Dear Chris Thanks for being willing to compromise about the recommendation. I'm happy with your text:
I also agree it would be a good idea to add a numerical timezone offset or I don't know what ISO 8601 says, since it's hidden behind a rather high paywall in Switzerland. The UDUNITS syntax is precisely described in its documentation but users may not find this easy to interpret. Hence we can help by describing it simply. Yes, unfortunately it looks like your words imply that
and after the list
Further up, as I mentioned before, we need:
Actually UDUNITS is even more flexible than that. Its format is y[-m[-d]] [H:[M:[S]]] [Z] i.e. only the year is mandatory, and only the hour is mandatory if you include the time. But I believe that in the examples we only use y-m-d or y-m-d H:M:S, so perhaps we don't need to mention this further flexibility. Best wishes Jonathan |
Indeed -- I rely on the Wikipedia interpretation :-)
Indeed, I have had no luck at all figuring out what unit strings are legal from the docs -- I've had to rely on experimenting with the command line utility (which is actually what they suggest in the docs). I've been meaning to suggest some improvements to the CF docs (or the UDUNITS docs) to address that, but haven't had the time yet. Example: UDUNITS will except all of "m" "meter" "meters" -- but I haven't seen that documented anywhere :-(. I think that back in the UDUNITS1 days, there was a Unit database that CF pointed to that made it pretty clear. But it's now all in an XML file that is, to say the least, not very human readable. So yes -- we need that description in the CF docs -- but it can be a subset of what UDUNITS allows. I'll make a few more changes to the PR based on your suggestions, and then take the WIP off. |
Edit: Apologies, Chris, I had indeed mis-read what you wrote. All good!
|
On the new line of text:
I find "(unspecified)" ambiguous. How about: "While the default (of omitting the Z component) is an offset of zero, we suggest that a zero offset be specified to avoid any confusion where omitting it might be misunderstood as indicating local time." |
OK -- updated the PR with Jonathan's and DAvid's suggestions. I've added "Z" to a couple examples. Question: we are generally using "0:0:0" to indicate the zero time -- I think I"ve seen "00:00:00" more often, which is more consistent with the two digits for everything format. Either is legal, but I prefer: "00:00:00". I started to change that in my PR, but discovered that there are a LOT of occurrences in ch04! So doing it in a PR if I"ll need to roll that back is silly. What do you al think? make that change or leave it as is? NOTE: if you have wordsmithing suggestions, it would be easier for me for you to put those in comments in the PR: #586 |
Thanks, Chris. Actually #586 looks unchanged to me. I am puzzled by this. You're right, both |
oops -- I forgot to push at the end of the day yesterday. there's a couple changes now. I didn't end up changing any of the 0:0:0 entries -- let sleeping docs lie. I think it's ready for review .... |
Thanks, Chris. I have added comments in the PR. |
I've addressed the comments in the PR -- getting close! |
Just a presentational question: I was wondering why we're using lower case designators for years, months, days ( We're currently internally consistent in the CF conventions document (good), but should we change to also be externally consistent? Changing to match ISO would involve also trivial changes to ~4 lines in chapter 7. |
David asked,
Yes, when I drafted this part, I followed chapter 7 for consistency. The present convention is consistent with Linux Apart from this, if we prefer to change it, I think the PR is fine. Thanks, Chris. |
I don't really mind about the upper/lower case thing, but I would probably lean on the side if ISO if asked to vote. Does anyone a stronger opinion than me or Jonathan? Likewise, this formatting question aside, I think the PR is fine. |
IIUC, UDUNITS is a bit more flexible than ISO 8601. But it's a bit hard to know, because I can't find it clearly documented anywhere. That's why we are laying it out clearly in the CF doc. But we're not documenting everything that's allowed, and we probably don't need to -- rather, here's something that will work. And in practice, folks follow examples more than they carefully parse out definitions (unless they are writing compliance software...). And the ISO standard is well known (if not any better understood). Given that, if UDUNITS is ISO compatible, it would probably be best to use ISO -- compliant explanations and examples. Hmm -- if, in fact, I am correct and ISO 8601 strings are UDUNITS compatible, maybe we should say that in the doc? Practical application: ISO formatting software is common, and I expect many folks use it to write CF files (I know I do: python's datetime has a Anyway, I need to put this PR to bed, so I'll leave that to another PR, if someone wants to write it. |
You are correct. Thanks for telling us! Was this true in UDUNITS version 1, I wonder? I for one didn't know it, and none of the examples in the CF document use the ISO timestamp format. However, the UDUNITS documentation says it does, in a symbolic form, and the command line utility agrees:
I agree that we should not include that in this issue. It's another matter. |
Title
Add a recommendation to including a TZ offset in time units
Moderator
TBA (if needed)
Moderator Status Review [last updated: YYYY-MM-DD]
TBA
Requirement Summary
The current doc, for time coordinates says:
"""
The reference datetime string (appearing after the identifier since) is required. It may include date alone, or date and time, or date, time and time zone offset. Its format is y-m-d [H:M:S [Z]], where […] indicates an optional element,
"""
So the timezone offset (Z) is optional.
and:
"""
The default time zone offset is zero.
"""
So leaving out the offset means that the timestamp is the prime meridian time.
It's far too late to change that default, but it is unfortunate that there is no way to express "naive" datetime, or "localtime", etc.
Granted, it's a bad practice to do so -- presumably data providers should know what offset their data are in, and let us know. But it's also, I think, bad practice to simply leave it off.
Note that according to Wikipedia, ISO 8601 specifies that:
"If no UTC relation information is given with a time representation, the time is assumed to be in local time"
Rather than UTC or meridian zero, or ...
So it's not unrealistic to think someone out there might expect that to be the case for CF. (or frankly, they simply aren't thinking about it).
Technical Proposal Summary
Proposal 1:
I propose that we add language to the effect of:
"""
The default time zone offset is zero, but it is recommended that an offset always be provided -- "+0" can be specified for reference datetime strings at the prime meridian.
"""
Proposal 2:
ISO 8601 allows "Z" to be used to mean 0 offset. In fact, according to wikipedia, it recommends it.
But it seems CF does not currently allow the "Z"
But I'm pretty sure I've seen otherwise conforming files use the "Z", and all the software I've tested accepts it.
So I think we should allow it in CF.
Conformance question:
not specific to this proposal, but is there a standard for flagging non-recommended practices in conformance checkers?
I thought there was a "conformance document" somewhere, but I can't find it -- am I imagining things? -- found it: https://github.com/cf-convention/Conformance/blob/master/conformance.adoc
Nevermind.
Benefits
Hopefully, future datasets will be a tiny bit less ambiguous in the future.
Status Quo
Status Quo is that a number of datasets in the wild don't have a TZ offset explicitly -- not a killer, but hopefully there might be fewer in the future if this is added.
Associated pull request
I'll do a PR, if folks think this is a good idea.
The text was updated successfully, but these errors were encountered: