-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samples with different length #104
Comments
Hi, Sorry for the late reply. Support for variable-length data sets is unfortunately not supported for the moment. Regarding WEASEL+MUSE, you can achieve this with the following process:
The main downside of this approach is the high memory (RAM) usage because the feature selection is performed at the last step. A possible solution (that would lead to the same results) would be to use a for loop for the Here is an example (without the aforementioned optimization, I can modify the example to show you if needed): import numpy as np
import matplotlib.pyplot as plt
from pyts.datasets import load_basic_motions
from pyts.multivariate.transformation import WEASELMUSE
import pandas as pd
from sklearn.feature_selection import chi2
#######################
####### D A T A #######
#######################
# Toy dataset
X_train, X_test, y_train, y_test = load_basic_motions(return_X_y=True)
# X_train.shape = X_test.shape = (40, 6, 100)
# Sample 4 random lengths between in the interval [80, 100]
rng = np.random.RandomState(42)
lengths = 80 + rng.choice(21, size=4, replace=False)
# Assign 10 time series to each length
lengths_samples_train_idx = rng.permutation(40).reshape((4, 10))
lengths_samples_test_idx = rng.permutation(40).reshape((4, 10))
#######################
# P A R A M E T E R S #
#######################
# WEASEL+MUSE parameters
weasel_muse_params = {'word_size': 2, 'n_bins':2, 'window_sizes': [12, 36],
'chi2_threshold': 1e-80}
transformer_list = [WEASELMUSE(**weasel_muse_params) for _ in range(4)]
#######################
### T R A I N I N G ###
#######################
X_weasel_train = []
for samples_idx, length, transformer in zip(lengths_samples_train_idx, lengths, transformer_list):
X_weasel_train.append(transformer.fit_transform(X_train[samples_idx, :, :length], y_train[samples_idx]))
# Concatenate the array as a DataFrame and fill NA values with 0
df_weasel_train = pd.concat([
pd.DataFrame.sparse.from_spmatrix(
X, index=samples_idx, columns=np.vectorize(transformer.vocabulary_.get)(np.arange(X.shape[1]))
)
for X, samples_idx, transformer in zip(X_weasel_train, lengths_samples_train_idx, transformer_list)
]).fillna(0.)
# Perform feature selection using chi2 test
chi2_threshold = 2.
chi2_statistics, _ = chi2(df_weasel_train, y_train)
features_idx_to_keep = np.where(chi2_statistics > chi2_threshold)[0]
features_to_keep = df_weasel_train.columns[features_idx_to_keep]
df_weasel_train = df_weasel_train[features_to_keep]
#######################
## I N F E R E N C E ##
#######################
X_weasel_test = []
for samples_idx, length, transformer in zip(lengths_samples_test_idx, lengths, transformer_list):
X_weasel_test.append(transformer.transform(X_test[samples_idx, :, :length]))
# Concatenate the array as a DataFrame and fill NA values with 0
df_weasel_test = pd.concat([
pd.DataFrame.sparse.from_spmatrix(
X, index=samples_idx, columns=np.vectorize(transformer.vocabulary_.get)(np.arange(X.shape[1]))
)
for X, samples_idx, transformer in zip(X_weasel_test, lengths_samples_test_idx, transformer_list)
]).fillna(0.)[features_to_keep] Let me know if this helps you. |
oh wow, thanks for the extensive example. |
If I understand the WEASEL+MUSE algorithm correctly it should be possible to use it with samples of different lengths.
This is currently not possible with the API of the WEASELMUSE class which expects a 3d array in the shape = (n_samples, n_features, n_timestamps) since a numpy array has the same shape for all samples.
I tried to fill the time series of all samples to the length of the longest samples with nan values, but the input checks reject nan values.
Is there a way to achieve using samples of different lengths?
The text was updated successfully, but these errors were encountered: