Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time series filtering #456

Open
imbrem opened this issue Jun 4, 2020 · 7 comments
Open

Time series filtering #456

imbrem opened this issue Jun 4, 2020 · 7 comments

Comments

@imbrem
Copy link

imbrem commented Jun 4, 2020

I realized that, while there's an efficient way to map timeseries, I can't find an efficient way to filter them. I think adding a filter function would be pretty helpful, and of course I'd be willing to make a PR myself.

Another feature I'd like to see would be, just like we can merge time series, to have a merge-like operation which takes in a time series and a (sorted) list of dates (or perhaps add a keyword argument to merge for this and make merge with just one item be the identity) and returns only the elements of that time series which have a date in the sorted list of dates. I'd also be willing to write a PR for this one, but would like some advice as to whether I should make a separate function or add a keyword argument to merge (in this case, we'd should also support passing in multiple sorted date vectors... perhaps we could also simply overload merge to accept sorted date vectors as well as TimeArrays...)

@iblislin
Copy link
Collaborator

iblislin commented Jun 7, 2020

Another feature I'd like to see would be, just like we can merge time series, to have a merge-like operation which takes in a time series and a (sorted) list of dates (or perhaps add a keyword argument to merge for this and make merge with just one item be the identity) and returns only the elements of that time series which have a date in the sorted list of dates. I'd also be willing to write a PR for this one,

I guess getindex is the one you want.

julia> cl[[Date(2001, 5, 1), Date(2001, 10, 1)]]
2×1 TimeArray{Float64,1,Date,Array{Float64,1}} 2001-05-01 to 2001-10-01
│            │ Close │
├────────────┼───────┤
│ 2001-05-0125.93 │
│ 2001-10-0115.54

@imbrem
Copy link
Author

imbrem commented Jun 7, 2020

Well that was dumb haha! Still a pretty new user so I didn't see it. Thanks! What about the other ones?

EDIT: Also does getindex just skip missing indices (like merge) and does it assume the indices are sorted?

@iblislin
Copy link
Collaborator

iblislin commented Jun 7, 2020

What about the other ones?

If you mean filter!, let's discuss it in #436 .

Also does getindex just skip missing indices (like merge) and does it assume the indices are sorted?

It skips missing indices, and if the indices is not sorted, getindex will sort it first.

And... I just found another issue while testing it. Since we starting to move toward accepting dup time index, cl[[Date(2001, 10, 1), Date(2001, 5, 1), Date(2011, 5, 1)]] should output duplicated time index. I'm going to file another issue ticket.

@findmyway
Copy link

julia> cl[[Date(2001, 5, 1), Date(2001, 10, 1)]]
2×1 TimeArray{Float64,1,Date,Array{Float64,1}} 2001-05-01 to 2001-10-01
│            │ Close │
├────────────┼───────┤
│ 2001-05-0125.93 │
│ 2001-10-0115.54

One thing I found not very intuitive is the display 2001-05-01 to 2001-10-01 is kind of misleading sometimes when the TimeArray is large and not consecutive.

Another one is the findall, see this:

julia> findall(>(100), ohlc[:Close])
ERROR: MethodError: no method matching keys(::TimeArray{Float64, 1, Date, Vector{Float64}})

Stacktrace:
 [1] pairs(collection::TimeArray{Float64, 1, Date, Vector{Float64}})
   @ Base ./abstractdict.jl:138
 [2] findall(testf::Base.Fix2{typeof(>), Int64}, A::TimeArray{Float64, 1, Date, Vector{Float64}})
   @ Base ./array.jl:2153
 [3] top-level scope
   @ REPL[30]:1

Not sure if this feature is needed here.

@iblislin
Copy link
Collaborator

One thing I found not very intuitive is the display 2001-05-01 to 2001-10-01 is kind of misleading sometimes when the TimeArray is large and not consecutive.

hmm, but it shows the number of row and cols, also. And I don't have any idea how to improve the printing at this moment.


well, I think most of cases of findall can be finished by broadcasting.

findall(ohlc[:Close] .> 100)

Maybe I can make the signature findall(f, ::TimeArray) just converted to broadcasting.
Are there any cases that cannot be satisfied by broadcasting?

@findmyway
Copy link

well, I think most of cases of findall can be finished by broadcasting.
findall(ohlc[:Close] .> 100)

Yeah, but it will create a temporary bool vector here I think.

@iblislin
Copy link
Collaborator

Yeah, but it will create a temporary bool vector here I think.

ah, great point. I can make a PR about the finall(f, ::TimeArray).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants