Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of creating NanoDates from strings #8

Merged
merged 1 commit into from
Jul 27, 2022

Conversation

anowacki
Copy link
Collaborator

The existing implementation uses Meta.parse to parse strings
as Ints, but this is type-unstable.

By replacing uses of Meta.parse with Base.parse, we can
improve the speed of constructing NanoDates with Strings
several times over, reducing dynamic dispatch and allocations.

Before this commit:

julia> @benchmark NanoDate("2000-01-01T00:00:00.123456789")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  41.872 μs   4.764 ms  ┊ GC (min  max): 0.00%  97.44%
 Time  (median):     43.145 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   44.887 μs ± 52.951 μs  ┊ GC (mean ± σ):  1.03% ±  0.97%

     ▃██▃
  ▂▃▆████▇▅▄▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  41.9 μs         Histogram: frequency by time        55.8 μs <

 Memory estimate: 5.69 KiB, allocs estimate: 81.

After this commit:

julia> @benchmark NanoDate("2000-01-01T00:00:00.123456789")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):   9.516 μs   5.185 ms  ┊ GC (min  max): 0.00%  98.95%
 Time  (median):     10.324 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.229 μs ± 51.838 μs  ┊ GC (mean ± σ):  4.57% ±  0.99%

     ▆█▆▃
  ▁▄██████▆▄▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  9.52 μs         Histogram: frequency by time        18.5 μs <

 Memory estimate: 5.42 KiB, allocs estimate: 71.

The current speed of parsing strings to NanoDates is still vastly
slower than constructing Dates.DateTimes from strings (which takes
nanoseconds and does not allocate), but hopefully for now this small
change is welcome. Longer-term, it would be great to get string
parsing as fast as DateTimes as this could be the bottleneck in
some applications.

By replacing uses of `Meta.parse` with `Base.parse`, improve the
speed of constructing `NanoDate`s with `String`s by several times.

Before this commit:

```julia
julia> @benchmark NanoDate("2000-01-01T00:00:00.123456789")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  41.872 μs …  4.764 ms  ┊ GC (min … max): 0.00% … 97.44%
 Time  (median):     43.145 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   44.887 μs ± 52.951 μs  ┊ GC (mean ± σ):  1.03% ±  0.97%

     ▃██▃
  ▂▃▆████▇▅▄▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  41.9 μs         Histogram: frequency by time        55.8 μs <

 Memory estimate: 5.69 KiB, allocs estimate: 81.
```

After this commit:
```julia
julia> @benchmark NanoDate("2000-01-01T00:00:00.123456789")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   9.516 μs …  5.185 ms  ┊ GC (min … max): 0.00% … 98.95%
 Time  (median):     10.324 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.229 μs ± 51.838 μs  ┊ GC (mean ± σ):  4.57% ±  0.99%

     ▆█▆▃
  ▁▄██████▆▄▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  9.52 μs         Histogram: frequency by time        18.5 μs <

 Memory estimate: 5.42 KiB, allocs estimate: 71.
```
@coveralls
Copy link

coveralls commented Jul 27, 2022

Pull Request Test Coverage Report for Build 2742520393

  • 4 of 4 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 40.922%

Totals Coverage Status
Change from base Build 2741567885: 0.0%
Covered Lines: 444
Relevant Lines: 1085

💛 - Coveralls

@JeffreySarnoff
Copy link
Member

Yes and Thank you. Nowadays speedier parsing for simpler types is provided with Parsers.jl, -- which is already incorporated in Project.toml and using_ed in NanoDates.jl; the functions are not yet called within our srcfiles.
I intend for their Integer and their Date parsers to be as a subfloor supporting NanoDates' parsing wherever it makes sense.

from their README, the section on Date/DateTime parsing

using Dates
a = Parsers.parse(Date, "2018-01-01")

# custom dateformat
b = Parsers.parse(Date, "01/20/2018",
           Parsers.Options(dateformat="mm/dd/yyyy"))

I will ask for JuliaData's guidance in extending their parsers for types exported by Dates.jl and paste the link to that thread here.

@JeffreySarnoff
Copy link
Member

For our purposes, there are a few issues with the approach[es] Dates.jl takes -- I never saw any problem until working to clear the last of some parsing test cases [given by others from their actual needs]. After running into strong algorithmic stop signs I had to decide (a) copy a number of large sections from the Dates.jl source files and reprogram away the corner cases or (b) toss it in favor of a different approach.
(a) was disheartening and hidden piracy (b) is where "make it work, then make if fast." applies.

On the other side of the mirror, where dates-with-times are pushed out as the same, now quite slowly, parsed strings, this happens and is made faster with attention, re-simplification and benchmarking.
The general dateformat strings that are intended to provide a soft and consistent place for temporal measures to land and reside,, transparently are designed from the DatePeriod and TimePeriod types and the substructural algebra that applies to some constituents. ---

NanoDates sees "yyyy-mm-ddTHH:MM:SS.ssssssZ" like this
" - - T : : . Z" with adjacent spaces as typed digit hammocks
This is very much more flexible and can be processed with periods handled in parallel.

@anowacki
Copy link
Collaborator Author

Okay, thanks for explaining and good to hear there is a Parsers.jl-backed implementation planned which I guess would be somewhat quicker. I think I had misunderstood the approach based on my incorrect reading of #1.

@JeffreySarnoff
Copy link
Member

mmm - I am merging this (a) it is ready now (b) you did the work
we will move this now and that then! Thank you.

@JeffreySarnoff JeffreySarnoff merged commit 0085f94 into JuliaTime:main Jul 27, 2022
@anowacki
Copy link
Collaborator Author

Okay, glad to have this short-term speedup for now.

I just looked at Parsers.jl. The DateTime parsing is only about twice as slow as Dates's own when using the standard format, and only three times worse with a custom DateFormat, though it does allocate:

julia> using Dates, Parsers, BenchmarkTools

julia> @benchmark parse(DateTime, "2000-01-01T00:00:00.001")
BenchmarkTools.Trial: 10000 samples with 973 evaluations.
 Range (min  max):  73.181 ns  153.527 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     77.379 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.134 ns ±   7.609 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▅▆▆▆█▆▇▅▅▄▃▃▁▂▁▁▁▁▁ ▂▂▂▂▃▁▂▃▁▁▂                             ▂
  ██████████████████████████████████▇█▇▇█▇█▇█▇█████▇▇▃▃▄▅▅▅▅▅▆ █
  73.2 ns       Histogram: log(frequency) by time       111 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Parsers.parse(DateTime, "2000-01-01T00:00:00.001")
BenchmarkTools.Trial: 10000 samples with 896 evaluations.
 Range (min  max):  124.990 ns  252.307 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     128.374 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   132.139 ns ±   9.320 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃█▇▇▆▁▄▄▂▁ ▁▁        ▆▃ ▁ ▂      ▁                           ▂
  ▂██████████▅████▅▄██▆▇██▇████▇▇████▇▆▆▇▆▇▇▆▅▆▆▆▇▇▆▅▄▇▇▇▆▇▆▆▆▆ █
  125 ns        Histogram: log(frequency) by time        169 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark parse(DateTime, "2000-01-01T00:00:00.001", dateformat"yyyy-mm-ddTHH:MM:SS.sss")
BenchmarkTools.Trial: 10000 samples with 905 evaluations.
 Range (min  max):  120.779 ns  244.760 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     127.523 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   132.245 ns ±  12.301 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█  ▄▅ ▇█▂▁▃▅ ▃▁▁▂ ▃▂▂▃ ▄▅▂▃ ▁  ▃▅         ▁▁                 ▂
  ██▆██████████▇████▅████████████▇█████▇██████████▇▇█▇▇▆▆▇▆▇▇▇█ █
  121 ns        Histogram: log(frequency) by time        172 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Parsers.parse(DateTime, "2000-01-01T00:00:00.001", Parsers.Options(dateformat=dateformat"yyyy-mm-ddTHH:MM:SS.sss"))
BenchmarkTools.Trial: 10000 samples with 206 evaluations.
 Range (min  max):  355.388 ns   15.810 μs  ┊ GC (min  max):  0.00%  97.41%
 Time  (median):     362.529 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   409.231 ns ± 779.923 ns  ┊ GC (mean ± σ):  10.08% ±  5.15%

  ▁▅▇██▆▅▄▃▃▃▂▁▁▂▂▃▂▁▁▂▁▁                                       ▂
  █████████████████████████▇▆▆▆▄▆▃▅▃▁▅▄▄▅▅▅▅▆▇▇▇██▇▇▇▇▆▆▆▅▅▆▄▅▅ █
  355 ns        Histogram: log(frequency) by time        461 ns <

 Memory estimate: 624 bytes, allocs estimate: 16.

Thought I would post that since I had run the benchmarks but you are probably well aware already.

@anowacki anowacki deleted the an/remove-meta-parse branch July 28, 2022 09:06
@coveralls
Copy link

coveralls commented Nov 6, 2024

Pull Request Test Coverage Report for Build 2742520393

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 4 of 4 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.5%) to 41.411%

Totals Coverage Status
Change from base Build 2741567885: 0.5%
Covered Lines: 446
Relevant Lines: 1077

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants