You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the nice things about C++'s Chrono library is that, AIUI, it provides a way to specify your own representation for various data types. So for example, if you want to represent a date as a single signed 16-bit integer as the number of days since the Unix epoch, then that's something that is natively supported. See #3 also about this.
I'm not sure that Rust's type system is powerful enough to do what C++'s Chrono does nicely, and I am in general pretty averse to highly generic code anyway. But I do wonder if there are some important use cases that we end up missing because of this.
One example that I've been able to find is in Polars:
/// A [`i64`] representing a timestamp measured in [`TimeUnit`] with an optional timezone.////// Time is measured as a Unix epoch, counting the seconds from/// 00:00:00.000 on 1 January 1970, excluding leap seconds,/// as a 64-bit signed integer.////// The time zone is a string indicating the name of a time zone, one of:////// * As used in the Olson time zone database (the "tz database" or/// "tzdata"), such as "America/New_York"/// * An absolute time zone offset of the form +XX:XX or -XX:XX, such as +07:30////// When the timezone is not specified, the timestamp is considered to have no timezone/// and is represented _as is_Timestamp(TimeUnit,Option<PlSmallStr>),/// An [`i32`] representing the elapsed time since UNIX epoch (1970-01-01)/// in days.Date32,/// An [`i64`] representing the elapsed time since UNIX epoch (1970-01-01)/// in milliseconds. Values are evenly divisible by 86400000.Date64,/// A 32-bit time representing the elapsed time since midnight in the unit of `TimeUnit`./// Only [`TimeUnit::Second`] and [`TimeUnit::Millisecond`] are supported on this variant.Time32(TimeUnit),/// A 64-bit time representing the elapsed time since midnight in the unit of `TimeUnit`./// Only [`TimeUnit::Microsecond`] and [`TimeUnit::Nanosecond`] are supported on this variant.Time64(TimeUnit),
I looked into this because I've seen more than one person complain about Chrono (that is, the Rust chrono crate) being "slow" in the context of Polars. For example. I spent a little bit of time looking into what Polars is doing, and honestly, I didn't make much headway. Polars is pretty complex and it's doing a lot of stuff with datetimes. So it's pretty hard for me to distill it down and figure out whether there is a better way to do what they're doing.
However, I did notice the above code. That is, Polars isn't actually using chrono's native types. They're using their own. It seems to represent dates via "Unix epoch days." One wonders whether this is the source of chrono's perceived slowness. That is, if Polars is always converting Unix epoch days to chrono's datetime type in order to do things like "get the weekday of this day," then yeah, I would expect that to be massively slow because chrono (like Jiff) doesn't use Unix epoch days as an internal representation. There is a fair bit of math involved in converting between year/month/days (what Jiff uses) and Unix epoch days. And indeed, some calculations are faster when using one representation versus the other. Finding the weekday, for example, is only a tiny bit of simple math if you have Unix epoch days. But if you have year/month/days, then the typical thing to do is convert to Unix epoch days (expensive) and then find the weekday from there (cheap).
So the representation you want to use, in part, depends on what you want to do with the data. All three of jiff, chrono and time use representations (with some differences) closer to the year/month/day style than the single Unix epoch day integer style.
I wonder, though, if it makes sense to add more types targeting use cases like Polars. So for example, maybe a new jiff::civil::UnixEpochDate type or something. And a jiff::civil::NanosecondTime type that represents the clock time, within a single day, by a single integer nanosecond. Although, even that wouldn't be flexible enough for Polars, which wants to use 32-bit integers for smaller precision needs.
The downside here is that this adds some new data types for pretty specialized use cases, and folks would still need to explicitly convert to them. And they can't really be used for higher level data types. e.g., You wouldn't be able to use a Zoned with a civil::UnixEpochDate instead of a civil::Date. Doing that would require generics, and I really don't want to go down that road with Zoned. (The complexities of generics were a big reason why I nixed leap second support for Jiff.)
Another possibility, like what I did with regex, is to split lower level but more complex APIs out into separately versioned crates. So for example, maybe we have a jiff-civil crate that defines the civil datetime types in a generic way (assuming that's even possible), but then jiff hides all of that. Honestly, I'm not sure.
Anyway, I wanted to write this down because this is something I'm thinking about, and I'm curious whether use cases like Polars can be more effectively served than what they're doing now. My instinct is that there exists a world in which Polars could have a fair bit of its datetime handling lifted out and done by a datetime library, but I think the competing pressure here is performance requirements. If anyone knows folks from the Polars project that can say more about their datetime handling and what kinds of things they want from a datetime library. (And that answer might be, "we're happy doing things ourselves so that we can own that code.")
The text was updated successfully, but these errors were encountered:
One of the nice things about C++'s Chrono library is that, AIUI, it provides a way to specify your own representation for various data types. So for example, if you want to represent a date as a single signed 16-bit integer as the number of days since the Unix epoch, then that's something that is natively supported. See #3 also about this.
I'm not sure that Rust's type system is powerful enough to do what C++'s Chrono does nicely, and I am in general pretty averse to highly generic code anyway. But I do wonder if there are some important use cases that we end up missing because of this.
One example that I've been able to find is in Polars:
https://github.com/pola-rs/polars/blob/106e23936bd8496210c08df76f6bf7b679bac86c/crates/polars-arrow/src/datatypes/mod.rs#L60-L86
Specifically, these primitive data types:
I looked into this because I've seen more than one person complain about Chrono (that is, the Rust
chrono
crate) being "slow" in the context of Polars. For example. I spent a little bit of time looking into what Polars is doing, and honestly, I didn't make much headway. Polars is pretty complex and it's doing a lot of stuff with datetimes. So it's pretty hard for me to distill it down and figure out whether there is a better way to do what they're doing.However, I did notice the above code. That is, Polars isn't actually using
chrono
's native types. They're using their own. It seems to represent dates via "Unix epoch days." One wonders whether this is the source ofchrono
's perceived slowness. That is, if Polars is always converting Unix epoch days tochrono
's datetime type in order to do things like "get the weekday of this day," then yeah, I would expect that to be massively slow becausechrono
(like Jiff) doesn't use Unix epoch days as an internal representation. There is a fair bit of math involved in converting between year/month/days (what Jiff uses) and Unix epoch days. And indeed, some calculations are faster when using one representation versus the other. Finding the weekday, for example, is only a tiny bit of simple math if you have Unix epoch days. But if you have year/month/days, then the typical thing to do is convert to Unix epoch days (expensive) and then find the weekday from there (cheap).So the representation you want to use, in part, depends on what you want to do with the data. All three of
jiff
,chrono
andtime
use representations (with some differences) closer to the year/month/day style than the single Unix epoch day integer style.I wonder, though, if it makes sense to add more types targeting use cases like Polars. So for example, maybe a new
jiff::civil::UnixEpochDate
type or something. And ajiff::civil::NanosecondTime
type that represents the clock time, within a single day, by a single integer nanosecond. Although, even that wouldn't be flexible enough for Polars, which wants to use 32-bit integers for smaller precision needs.The downside here is that this adds some new data types for pretty specialized use cases, and folks would still need to explicitly convert to them. And they can't really be used for higher level data types. e.g., You wouldn't be able to use a
Zoned
with acivil::UnixEpochDate
instead of acivil::Date
. Doing that would require generics, and I really don't want to go down that road withZoned
. (The complexities of generics were a big reason why I nixed leap second support for Jiff.)Another possibility, like what I did with
regex
, is to split lower level but more complex APIs out into separately versioned crates. So for example, maybe we have ajiff-civil
crate that defines the civil datetime types in a generic way (assuming that's even possible), but thenjiff
hides all of that. Honestly, I'm not sure.Anyway, I wanted to write this down because this is something I'm thinking about, and I'm curious whether use cases like Polars can be more effectively served than what they're doing now. My instinct is that there exists a world in which Polars could have a fair bit of its datetime handling lifted out and done by a datetime library, but I think the competing pressure here is performance requirements. If anyone knows folks from the Polars project that can say more about their datetime handling and what kinds of things they want from a datetime library. (And that answer might be, "we're happy doing things ourselves so that we can own that code.")
The text was updated successfully, but these errors were encountered: