Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] non-conformance of metrics.lcss with input interface expectations (3D numpy) #483

Open
fkiraly opened this issue Oct 6, 2023 · 2 comments

Comments

@fkiraly
Copy link

fkiraly commented Oct 6, 2023

It seems that metrics.lcss is non-conformant with generic interface expectations towards time series distances when passing 3D np.ndarrays.

It would seem that lcss fails on some, or all 3D np.ndarrays. This behaviour was observed when running an sktime conformant adapter through the sktime test suite, here:
sktime/sktime#5367 (comment)

Update: the issue seems to be that, unlike all other metrics, lcss does not accept 3D numpy as input.

A secondary issue is the lack of easily findable documentation on the input format of the "time series" in the various metrics. Unfortunately, metrics with the same docstring can have different input conventions (supports 3D or not).

@YannCabanes
Copy link
Contributor

YannCabanes commented Oct 16, 2023

Hello @fkiraly,
I have discussed this issue with @rtavenar: it is not a bug and it corresponds to the documentation (which we will make clearer).

The functions taking 3D arrays as inputs are the functions: cdist_*
For example in the docstring of the function cdist_dtw, there is the input parameter:

dataset1 : array-like
    A dataset of time series

Functions of the form cdist_* can also accept 1D or 2D arrays when they use:
dataset1 = to_time_series_dataset(dataset1)

The other metric functions usually take 2D arrays as inputs.
For example, in the docstring of the function lcss, there is the input parameter:

s1
    A time series.

By default time series should be 2D arrays, some metric functions also accept 1D arrays when they use:
s1 = to_time_series(s1)

To make it cleared, I will specify in the docstrings of the functions the possible shapes of the input arrays in the PR #486.

For example:

s1 : array-like, shape=(sz1, d) or (sz1,)
    A time series. If shape is (sz1,), the time series is assumed to be univariate.

Or:

dataset1 : array-like, shape=(n_ts, sz, d) or (n_ts, sz) or (sz,)
    A dataset of time series. If shape is (n_ts, sz), the dataset is composed of univariate time series. 
    If shape is (sz,), the dataset is a composed of a unique univariate time series.

@fkiraly
Copy link
Author

fkiraly commented Oct 23, 2023

I see, thanks for clarifying. The lack of specificity in the docstrings did cause some headache and trial-and-error on our side...

I would suggest in the docstrings to state clearly, in all 3 cases (1D, 2D, 3D), which index corresponds to instance index, time index, variable index.

Feel free to close this issue then.

PS: if I were to design the interface, I would add a default upcast to 3D to every distance function, and have only one single interface point. For deprecation period, you can add 3D to the 2D functions and a warning.
Why: simpler for the user to learn just one generic interface, simpler to maintain and test (just loop over the entire list of distances)

@YannCabanes YannCabanes removed the bug label Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants