readtimearray on duplicate timestamps behaviour #451

klangner · 2020-04-19T16:25:15Z

Currently when trying to read data from the CSV file with duplicate timestamps the function will crash.

Maybe it would be better to add parameter to this function so it will try to read as many rows as possible and then return partial result without crashing?
Or maybe just skip duplicate or out of order items?

BTW is there in Julia some kind of optional type? Like Haskell Maybe. Maybe then at least return this type instead of crashing the program?

The text was updated successfully, but these errors were encountered:

iblislin · 2020-05-05T07:02:18Z

Hi @klangner

Maybe it would be better to add parameter to this function so it will try to read as many rows as possible and then return partial result without crashing?
Or maybe just skip duplicate or out of order items?

well, in this case, I think you can load the CSV into a DataFrame first, then remove the duplicated rows, then TimeArray(df, timestamp = :MyTimeColumn).

BTW is there in Julia some kind of optional type? Like Haskell Maybe. Maybe then at least return this type instead of crashing the program?

I guess it's Missing?

imbrem · 2020-06-03T23:09:21Z

I currently implemented this with a very dirty hack, namely passing in open(uniq FILE_NAME), but I would appreciate a flag to just ignore out-of-order entries.

iblislin · 2020-06-04T03:46:32Z

Hi @imbrem
Could you show an example case that it contains duplicated timestamp?
I also wondering

If there is a time index in ascending order 2011/1/1, 2011/1/2, 2011/1/2, 2011/1/2, 2011/1/3 with 3 duplicated timestamps, which one do you expect to be skipped?
If there is a time index in descending order, which one do you expect to be skipped?

About out-of-order cases: I'm also curious about that is there an algorithm that can determine the out-of-order entrie?

klangner · 2020-06-04T09:02:48Z

Hi @iblis17,
I would say that you can find duplicate timestamps when dealing with Daylight saving time.
Quite often in the data you will see 1 hour missing and half year later duplicate 1 hour data.
Also it can happen when the data is not added in increasing time. E.g You get the data from multiple sensors but in batch mode. So you will end up with batches which can have overlapping timestamp.
IMHO when you work with real data everything can happen :-)

iblislin · 2020-06-04T12:13:47Z

I would say that you can find duplicate timestamps when dealing with Daylight saving time.
Quite often in the data you will see 1 hour missing and half year later duplicate 1 hour data.

oh, so in this case, the data is still in proper order, only the time index is not ideal.
I think applying lag, lead, or some time series method on them is still reasonable.
I will consider to release the constrain about the time index, maybe allow duplications.

Also it can happen when the data is not added in increasing time. E.g You get the data from multiple sensors but in batch mode. So you will end up with batches which can have overlapping timestamp.

But for this case, I do not think the method provided by TimeSeries.jl can be applied on these data.
It makes no sense if user want to lag, lead, moving... etc on it.
So what functionality can we improve/provide to help these kind of data?

iblislin · 2020-06-04T12:50:06Z

Ah, and just recall that we have an option unchecked, so you can get the out-of-order or duplicated time index work.

TimeArray(ts, vector; unchecked = true)

iblislin · 2020-06-04T16:31:32Z

Anyway, I made a PR for accepting duplicated but sorted time index.

#455

imbrem · 2020-06-04T18:22:14Z

That works fine, but could it also be possible to add an option to actually remove out-of-order or duplicate time stamps, and/or actually go back and update their values in the result array? If desired, I can write the PR for this.

iblislin · 2020-06-05T19:48:59Z

could it also be possible to add an option to actually remove out-of-order or duplicate time stamps

@imbrem yeah, PRs are welcomed.

and/or actually go back and update their values in the result array?

Updating issues still need more discussions, and I need some time to think about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readtimearray on duplicate timestamps behaviour #451

readtimearray on duplicate timestamps behaviour #451

klangner commented Apr 19, 2020

iblislin commented May 5, 2020

imbrem commented Jun 3, 2020

iblislin commented Jun 4, 2020

klangner commented Jun 4, 2020 •

edited

Loading

iblislin commented Jun 4, 2020

iblislin commented Jun 4, 2020

iblislin commented Jun 4, 2020

imbrem commented Jun 4, 2020

iblislin commented Jun 5, 2020

readtimearray on duplicate timestamps behaviour #451

readtimearray on duplicate timestamps behaviour #451

Comments

klangner commented Apr 19, 2020

iblislin commented May 5, 2020

imbrem commented Jun 3, 2020

iblislin commented Jun 4, 2020

klangner commented Jun 4, 2020 • edited Loading

iblislin commented Jun 4, 2020

iblislin commented Jun 4, 2020

iblislin commented Jun 4, 2020

imbrem commented Jun 4, 2020

iblislin commented Jun 5, 2020

klangner commented Jun 4, 2020 •

edited

Loading