-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add performance focused APIs #17
Comments
I think Timezones only really come into play in the front-end (which Rust is not typically used for...) or for some specific kinds of applications such as calendars, where events need to be scheduled at a "civil time" in the future. Overall this crate seems very well designed, but if I have one criticism it's that I think it over-emphasizes the use of If you intend this crate to displace
|
I think (3) is #24. I think (2) is probably #21. I am unsure about (1). I'm not sure if I agree that |
Heh, my experience is the opposite - in both JS and Python I've experienced significant extra complexity and bugs from their datetime types coming with a "packaged" timezone being used in places where timezones were not relevant. In Python the main issue has been use of datetime where a simple timestamp would be more appropriate, since 99% of cases I'm dealing with a point in time, and the timezone is only relevant in so far as displaying that instant to the user in their preferred timezone. Storage should (in this case) be done in UTC and that's what the database expects. In JS, only a couple of days ago I encountered a bug at work where a In contrast, I haven't had any issues with It's honestly quite hard to think of use-cases where I'd want to store a datetime + timezone. Storing a future event in a calendar is the go-to example, but even then the only motiviation for that (vs storing a timestamp) is to deal with the edge-case where timezone data changes between the event being scheduled and it actually occurring, and doing that comes at a cost: it becomes impossible to schedule events at "ambiguous" times (at least with the default serialization implementation of If I want to build a best-possible calendar app, what's more likely? That the timezone data changes in a meaningful way, or that someone wants to schedule an event during a DST change... Both seem similarly likely. (As an aside: maybe there's a case for changing the serialization to disambiguate these cases... It doesn't help if the timezone data changes and the event lies on a DST change, but that's a legitimate ambiguity in that case.)
Yeah that also works, I only didn't suggest it because I assumed you'd want to make the timezone functionality be default-enabled, and I really hate default features in Rust, since nobody remembers to turn them off. |
It is had to disentangle your examples of bugs with the underlying datetime library. Both of the datetime libraries you mention have very serious deficiencies.
I don't see why this would be true. You can't schedule an event at 2am on March 10, 2024 in New York because that clock time didn't exist. That seems right to me?
Yes, it would be enabled by default. I think we overall need to collect more data. That will take some time and require users. I don't know that you are wrong, but I think you are. |
My very high level idea is that datetime libraries have sucked for so long, and this may have been inhibiting use cases or causing other problems that are falsely attributed. This is just an opinion though. So basically, I am excited to see what, if anything, Jiff unlocks for folks. IMO, its killer feature is not just |
That's a non-existent time, not an ambiguous time. If I schedule an event at a time that occurs twice due to clocks being put backwards, then jiff's Zoned type appears to be able to correctly disambiguate the actual time, but if I serialize/deserialize it, then that information will be lost. I would expect the default serialization format to faithfully round-trip the value.
It's nice, but it's definitely not new. The interval type in PostgreSQL has always worked this way. |
Ah yeah, both Temporal and Jiff refer to it as ambiguous. The thing that is ambiguous is the offset.
No, it is not! This is a key feature of Jiff: use jiff::{civil::date, tz::TimeZone};
fn main() -> anyhow::Result<()> {
// The ambiguous civil datetime:
let dt = date(2024, 11, 3).at(1, 30, 0, 0);
// By default, the "compatible" disambiguation strategy is used. Which
// selects the later time for a gap and the earlier time for a fold:
let zdt = date(2024, 11, 3).at(1, 30, 0, 0).intz("America/New_York")?;
assert_eq!(zdt.to_string(), "2024-11-03T01:30:00-04:00[America/New_York]");
// One can override and pick whichever strategy they want:
let tz = TimeZone::get("America/New_York")?;
let zdt = tz.to_ambiguous_zoned(dt).later()?;
// Notice the offset change from above:
assert_eq!(zdt.to_string(), "2024-11-03T01:30:00-05:00[America/New_York]");
Ok(())
} See The fact that Jiff losslessly roundtrips
It doesn't though? I'm not intimately familiar with PostgreSQL's interval type, but from a quick glance I can immediately see several differences:
So for example, on that last point, this is what PostgreSQL does:
Compare this with Jiff: use jiff::{civil::date, ToSpan};
fn main() -> anyhow::Result<()> {
let span = 1.month().days(27);
// The span has non-uniform units (months), so a
// reference date is required to do addition. Otherwise,
// you get an error:
assert!(span.checked_add(3.days()).is_err());
// So pick a date:
assert_eq!(
span.checked_add((3.days(), date(2024, 6, 1)))?,
1.month().days(30)
);
// So pick a different date and notice that the span
// returned reflects the actual duration of the
// corresponding months:
assert_eq!(
span.checked_add((3.days(), date(2024, 1, 1)))?,
2.months().days(1)
);
Ok(())
} This all comes from Temporal and RFC 5545. |
Ah, my mistake - I didn't realise it would use the timezone offset as a disambiguator like that, nice!
It keeps track of units with variable ratios separately:
Because 120 minutes is always 2 hours, so it makes sense for them to be equal?
I can't say I've taken exhaustive stock of all the PostgreSQL date/time related functions, so perhaps this doesn't exist anywhere? I would be surprised though.
I'm not quite sure I understand this? Months are added separately to days, which are added separately to microseconds, so it doesn't matter what start date you have, |
I didn't say that it didn't make sense. But the fact that Jiff can distinguish them (but also compare them as equal via
The example I showed above demonstrates the issue. PostgreSQL can't balance days up into months. It does balance months up into years, because as you say, there's always 12 months in a year. But it can't do it for days. So you end up with things like You're focused on the correctness of the result. A My understanding of the Temporal design goal here was to 1) satisfy RFC 5545 (iCalendar) and 2) put all the hard logic of dealing with durations into the datetime library such that localizing a duration becomes a much simpler task. I asked more about it here: tc39/proposal-temporal#2915 |
But I think we can agree that:
|
Yes. And we haven't even gotten to the time zone aware aspects of
|
In building out Jiff, my focus has generally not been on performance. There are some Criterion benchmarks, but they are by no means exhaustive. I would love to have more added.
While I think there's always room to optimize the implementation (and if you have ideas that require major refactoring, please file an issue), this issue is more about API changes or additions that are needed to improve performance. For example, I'm thinking about things like "datetime arithmetic requires the use of
Span
, andSpan
is a very bulky type." At time of writing, its size is 64 bytes. I presume that this means that there is some non-trivial amount of time being spent in various operations onSpan
memcpying around this enormous bag of bytes.Getting rid of
Span
or splitting it up isn't really an option as far as I'm concerned. So in order to make things go faster, I think we really just need to be able to provide APIs that operate on more primitive data types. Like, for example, ani64
of nanoseconds. This would come with various trade-offs, so I'd expect these APIs to be a bit more verbosely named.Here's a probably non-exhaustive list of API design choices that likely have an impact on performance:
Timestamp
is always a 96-bit integer.Span
is a whopping 64 bytes. ASpan
isCopy
and most APIs accept aSpan
instead of a&Span
.Span
to aTimestamp
, it's not just a simple matter of ~one integer addition. There's work needed to be done to collapse all of the clock units onSpan
down to a single integer number of nanoseconds, and theni128
math is used. Perhaps there is a way to optimize aSpan
for the "single unit" case, although doing this without indirection will make aSpan
even bigger, and doing it with indirection will remove theCopy
bound. Another alternative might be leavingSpan
as-is, but adding non-Span
APIs to operate on durations. But that has some design work needed as well. See add more integration withstd::time::Duration
? #21.Zoned
embeds aTimeZone
which means aZoned
is notCopy
and cloning/dropping aZoned
is, while cheap, not as cheap as a smallCopy
type. (This also means most things take a&Zoned
instead of aZoned
.)TimeZoneDatabase
uses a thread safe shared cache internally. This means time zone lookups by name require some kind of synchronization. (Usually this overhead can be worked around by asking for all the time zones you need up-front, but this is difficult if you're deserializing zoned datetime strings.)debug_assertions
are enabled. So if you drop the ranged integer type to do bit-packing and then re-create the ranged integer after unpacking, you will have lost the min/max values that were originally attached. Anyway, see consider ripping out ranged integer types #11 for more on this.Anyway, before we get too focused on API design, I would like to see concrete use cases first in the form of executable code and/or benchmarks.
The text was updated successfully, but these errors were encountered: