-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speed of creating NanoDates from strings #8
Conversation
By replacing uses of `Meta.parse` with `Base.parse`, improve the speed of constructing `NanoDate`s with `String`s by several times. Before this commit: ```julia julia> @benchmark NanoDate("2000-01-01T00:00:00.123456789") BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 41.872 μs … 4.764 ms ┊ GC (min … max): 0.00% … 97.44% Time (median): 43.145 μs ┊ GC (median): 0.00% Time (mean ± σ): 44.887 μs ± 52.951 μs ┊ GC (mean ± σ): 1.03% ± 0.97% ▃██▃ ▂▃▆████▇▅▄▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃ 41.9 μs Histogram: frequency by time 55.8 μs < Memory estimate: 5.69 KiB, allocs estimate: 81. ``` After this commit: ```julia julia> @benchmark NanoDate("2000-01-01T00:00:00.123456789") BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 9.516 μs … 5.185 ms ┊ GC (min … max): 0.00% … 98.95% Time (median): 10.324 μs ┊ GC (median): 0.00% Time (mean ± σ): 11.229 μs ± 51.838 μs ┊ GC (mean ± σ): 4.57% ± 0.99% ▆█▆▃ ▁▄██████▆▄▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 9.52 μs Histogram: frequency by time 18.5 μs < Memory estimate: 5.42 KiB, allocs estimate: 71. ```
Pull Request Test Coverage Report for Build 2742520393
💛 - Coveralls |
Yes and Thank you. Nowadays speedier parsing for simpler types is provided with Parsers.jl, -- which is already incorporated in Project.toml and using_ed in NanoDates.jl; the functions are not yet called within our srcfiles. from their README, the section on Date/DateTime parsing
I will ask for JuliaData's guidance in extending their parsers for types exported by Dates.jl and paste the link to that thread here. |
For our purposes, there are a few issues with the approach[es] Dates.jl takes -- I never saw any problem until working to clear the last of some parsing test cases [given by others from their actual needs]. After running into strong algorithmic stop signs I had to decide (a) copy a number of large sections from the Dates.jl source files and reprogram away the corner cases or (b) toss it in favor of a different approach. On the other side of the mirror, where dates-with-times are pushed out as the same, now quite slowly, parsed strings, this happens and is made faster with attention, re-simplification and benchmarking. NanoDates sees "yyyy-mm-ddTHH:MM:SS.ssssssZ" like this |
Okay, thanks for explaining and good to hear there is a Parsers.jl-backed implementation planned which I guess would be somewhat quicker. I think I had misunderstood the approach based on my incorrect reading of #1. |
mmm - I am merging this (a) it is ready now (b) you did the work |
Okay, glad to have this short-term speedup for now. I just looked at Parsers.jl. The julia> using Dates, Parsers, BenchmarkTools
julia> @benchmark parse(DateTime, "2000-01-01T00:00:00.001")
BenchmarkTools.Trial: 10000 samples with 973 evaluations.
Range (min … max): 73.181 ns … 153.527 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 77.379 ns ┊ GC (median): 0.00%
Time (mean ± σ): 80.134 ns ± 7.609 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▅▆▆▆█▆▇▅▅▄▃▃▁▂▁▁▁▁▁ ▂▂▂▂▃▁▂▃▁▁▂ ▂
██████████████████████████████████▇█▇▇█▇█▇█▇█████▇▇▃▃▄▅▅▅▅▅▆ █
73.2 ns Histogram: log(frequency) by time 111 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Parsers.parse(DateTime, "2000-01-01T00:00:00.001")
BenchmarkTools.Trial: 10000 samples with 896 evaluations.
Range (min … max): 124.990 ns … 252.307 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 128.374 ns ┊ GC (median): 0.00%
Time (mean ± σ): 132.139 ns ± 9.320 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█▇▇▆▁▄▄▂▁ ▁▁ ▆▃ ▁ ▂ ▁ ▂
▂██████████▅████▅▄██▆▇██▇████▇▇████▇▆▆▇▆▇▇▆▅▆▆▆▇▇▆▅▄▇▇▇▆▇▆▆▆▆ █
125 ns Histogram: log(frequency) by time 169 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark parse(DateTime, "2000-01-01T00:00:00.001", dateformat"yyyy-mm-ddTHH:MM:SS.sss")
BenchmarkTools.Trial: 10000 samples with 905 evaluations.
Range (min … max): 120.779 ns … 244.760 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 127.523 ns ┊ GC (median): 0.00%
Time (mean ± σ): 132.245 ns ± 12.301 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█ ▄▅ ▇█▂▁▃▅ ▃▁▁▂ ▃▂▂▃ ▄▅▂▃ ▁ ▃▅ ▁▁ ▂
██▆██████████▇████▅████████████▇█████▇██████████▇▇█▇▇▆▆▇▆▇▇▇█ █
121 ns Histogram: log(frequency) by time 172 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Parsers.parse(DateTime, "2000-01-01T00:00:00.001", Parsers.Options(dateformat=dateformat"yyyy-mm-ddTHH:MM:SS.sss"))
BenchmarkTools.Trial: 10000 samples with 206 evaluations.
Range (min … max): 355.388 ns … 15.810 μs ┊ GC (min … max): 0.00% … 97.41%
Time (median): 362.529 ns ┊ GC (median): 0.00%
Time (mean ± σ): 409.231 ns ± 779.923 ns ┊ GC (mean ± σ): 10.08% ± 5.15%
▁▅▇██▆▅▄▃▃▃▂▁▁▂▂▃▂▁▁▂▁▁ ▂
█████████████████████████▇▆▆▆▄▆▃▅▃▁▅▄▄▅▅▅▅▆▇▇▇██▇▇▇▇▆▆▆▅▅▆▄▅▅ █
355 ns Histogram: log(frequency) by time 461 ns <
Memory estimate: 624 bytes, allocs estimate: 16. Thought I would post that since I had run the benchmarks but you are probably well aware already. |
Pull Request Test Coverage Report for Build 2742520393Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
The existing implementation uses
Meta.parse
to parse stringsas
Int
s, but this is type-unstable.By replacing uses of
Meta.parse
withBase.parse
, we canimprove the speed of constructing
NanoDate
s withString
sseveral times over, reducing dynamic dispatch and allocations.
Before this commit:
After this commit:
The current speed of parsing strings to
NanoDate
s is still vastlyslower than constructing
Dates.DateTime
s from strings (which takesnanoseconds and does not allocate), but hopefully for now this small
change is welcome. Longer-term, it would be great to get string
parsing as fast as
DateTime
s as this could be the bottleneck insome applications.