Skip to content

[BUG] Feature selection transforms fail to make inverse_transform after make_future #1138

Open
1 task done
Mr-Geekman opened this issue Feb 28, 2023 · 0 comments
Open
1 task done
Labels
bug Something isn't working priority/medium Medium priority task

Comments

@Mr-Geekman
Copy link
Contributor

Mr-Geekman commented Feb 28, 2023

🐛 Bug Report

Feature selection transforms with option return_features=True work incorrectly during inverse_transform.

During transform it stores all the removed features inside transform and returns it back during inverse_transform. The problem occurs when we try to make inverse_transform after make_future because after this operation our dataset become smaller and concatenation with data saved during transform corrupts the result.

I think that we shouldn't store any state during fit. As I understand, this problem can be solved if we only hide this columns in TSDataset instead of removing and unhide them during inverse_transform. This makes the task quite difficult.

Expected behavior

Feature selection transforms works fine during inverse_transform after make_future.

How To Reproduce

Script with bug:

import pandas as pd
import numpy as np

from etna.transforms import FilterFeaturesTransform
from etna.datasets import TSDataset


def get_regular_ts() -> TSDataset:
    periods = 100
    df_1 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df_1["segment"] = "segment_1"
    df_1["target"] = np.random.uniform(10, 20, size=periods)

    df_2 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df_2["segment"] = "segment_2"
    df_2["target"] = np.random.uniform(-15, 5, size=periods)

    df_3 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df_3["segment"] = "segment_3"
    df_3["target"] = np.random.uniform(-5, 5, size=periods)

    df = pd.concat([df_1, df_2, df_3]).reset_index(drop=True)
    df = TSDataset.to_dataset(df)
    tsds = TSDataset(df, freq="D")

    return tsds


def get_ts_with_exog() -> TSDataset:
    regular_ts = get_regular_ts()
    df = regular_ts.to_pandas(flatten=True)
    df_exog = df.copy().drop(columns=["target"])
    df_exog["weekday"] = df_exog["timestamp"].dt.weekday
    df_exog["monthday"] = df_exog["timestamp"].dt.day
    df_exog["month"] = df_exog["timestamp"].dt.month
    df_exog["year"] = df_exog["timestamp"].dt.year
    ts = TSDataset(df=TSDataset.to_dataset(df).iloc[5:-20], df_exog=TSDataset.to_dataset(df_exog), freq="D")
    return ts


def main():
    ts = get_ts_with_exog()
    transform = FilterFeaturesTransform(exclude=["year"], return_features=True)

    ts.fit_transform([transform])
    future_ts = ts.make_future(future_steps=10, tail_steps=0)

    future_ts.inverse_transform()
    future_df = future_ts.to_pandas()

    assert len(future_df) == 10


if __name__ == "__main__":
    main()

This script fails because in reality we have len(future_df) == 85.

Environment

No response

Additional context

No response

Checklist

  • Bug appears at the latest library version
@Mr-Geekman Mr-Geekman added the bug Something isn't working label Feb 28, 2023
@github-project-automation github-project-automation bot moved this to Specification in etna board Feb 28, 2023
@Mr-Geekman Mr-Geekman moved this from Specification to Todo in etna board Feb 28, 2023
@Mr-Geekman Mr-Geekman added the priority/medium Medium priority task label Jun 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working priority/medium Medium priority task
Projects
Status: Todo
Development

No branches or pull requests

1 participant