Skip to content

[BUG] MRMRFeatureSelectionTransform fails on fit with redundancy_aggregation_mode="median" #1249

Open
1 task done
Mr-Geekman opened this issue Apr 27, 2023 · 0 comments
Open
1 task done
Labels
bug Something isn't working priority/high High priority task

Comments

@Mr-Geekman
Copy link
Contributor

Mr-Geekman commented Apr 27, 2023

🐛 Bug Report

MRMRFeatureSelectionTransform fails on fit with redundancy_aggregation_mode="median". I'm not really sure, that the problem is with "median" value.

Expected behavior

Works fine.

How To Reproduce

import pandas as pd
from numpy.random import RandomState
from sklearn.ensemble import RandomForestRegressor

from etna.analysis import ModelRelevanceTable
from etna.datasets import TSDataset, generate_ar_df
from etna.transforms import MRMRFeatureSelectionTransform


def get_ts():
    num_segments = 3
    df = generate_ar_df(
        start_time="2020-01-01", periods=300, ar_coef=[1], sigma=1, n_segments=num_segments, random_seed=0, freq="D"
    )

    example_segment = df["segment"].unique()[0]
    timestamp = df[df["segment"] == example_segment]["timestamp"]
    df_exog = pd.DataFrame({"timestamp": timestamp})

    # useless regressors
    num_useless = 12
    df_regressors_useless = generate_ar_df(
        start_time="2020-01-01", periods=300, ar_coef=[1], sigma=1, n_segments=num_useless, random_seed=1, freq="D"
    )
    for i, segment in enumerate(df_regressors_useless["segment"].unique()):
        regressor = df_regressors_useless[df_regressors_useless["segment"] == segment]["target"].values
        df_exog[f"regressor_useless_{i}"] = regressor

    # useful regressors: the same as target but with little noise
    df_regressors_useful = df.copy()
    sampler = RandomState(seed=2).normal
    for i, segment in enumerate(df_regressors_useful["segment"].unique()):
        regressor = df_regressors_useful[df_regressors_useful["segment"] == segment]["target"].values
        noise = sampler(scale=0.05, size=regressor.shape)
        df_exog[f"regressor_useful_{i}"] = regressor + noise

    # construct exog
    classic_exog_list = []
    for segment in df["segment"].unique():
        tmp = df_exog.copy(deep=True)
        tmp["segment"] = segment
        classic_exog_list.append(tmp)
    df_exog_all_segments = pd.concat(classic_exog_list)

    # construct TSDataset
    df = df[df["timestamp"] <= timestamp[200]]
    return TSDataset(
        df=TSDataset.to_dataset(df),
        df_exog=TSDataset.to_dataset(df_exog_all_segments),
        freq="D",
        known_future="all",
    )


def main():
    ts = get_ts()
    transform = MRMRFeatureSelectionTransform(
        relevance_table=ModelRelevanceTable(), top_k=3, model=RandomForestRegressor(random_state=42),
        redundancy_aggregation_mode="median"
    )
    transform.fit(ts)


if __name__ == "__main__":
    main()

Environment

No response

Additional context

After the fix we should add relevance_aggregation_mode, redundancy_aggregation_mode into MRMRFeatureSelectionTransform.params_to_tune in automl-2.0 branch.

Checklist

  • Bug appears at the latest library version
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working priority/high High priority task
Projects
Status: Todo
Development

No branches or pull requests

1 participant