Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to finetune on a dataset #174

Open
marcopeix opened this issue Jan 19, 2025 · 3 comments
Open

Trying to finetune on a dataset #174

marcopeix opened this issue Jan 19, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@marcopeix
Copy link

  • Working in Google Colab
  • Using the latest version and cloned the repo inside Colab

I managed to prepare the data. I manually split my dataset into a train/val set and saved the dataset using HF in the data directory` at the root of the repo like this:

url = "https://raw.githubusercontent.com/marcopeix/TimeSeriesForecastingUsingFoundationModels/refs/heads/main/data/walmart_sales_small.csv"
df = pd.read_csv(url, index_col='Date', parse_dates=True)
df = df[df['Store'] == 1]
finetune_df = df[:-32]
train_df, val_df = finetune_df[:-16], finetune_df[-16:]

def multivar_gen_func(df) -> Generator[dict[str, Any], None, None]:
    features_df = df.drop(['Store', 'Weekly_Sales'], axis=1)
    yield {
        "target": df['Weekly_Sales'].to_numpy(),
        "features": features_df.to_numpy().T,
        "start": df.index[0],
        "freq": pd.infer_freq(df.index),
        "item_id": "1",
    }
features = Features(
    dict(
        target=Sequence(Value("float32")),  
        features=Sequence(
            Sequence(Value("float32")), 
            length=len(df.columns) - 2  
        ),
        start=Value("timestamp[s]"),
        freq=Value("string"),
        item_id=Value("string"),
    )
)
hf_train_ds = datasets.Dataset.from_generator(
    lambda: multivar_gen_func(train_df), 
    features=features
)
hf_val_ds = datasets.Dataset.from_generator(
    lambda: multivar_gen_func(train_df),  # Pass df to the generator function
    features=features
)
hf_train_ds.save_to_disk("/content/uni2ts/data/store_ds")
hf_val_ds.save_to_disk("/content/uni2ts//data/store_ds_eval")

Then, I created a YAML configuration file for both training and validation. They are respectively in /content/uni2ts/cli/conf/finetune/data/store_finetune.yaml and /content/uni2ts/cli/conf/finetune/val_data/store_finetune.yaml.

The training YAML file is:

_target_: uni2ts.data.builder.simple.SimpleDatasetBuilder
dataset: store_ds
weight: 1000
storage_path: /content/uni2ts/data

and the validation YAML file is:

_target_: uni2ts.data.builder.ConcatDatasetBuilder
_args_:
  _target_: uni2ts.data.builder.simple.generate_eval_builders
  dataset: store_ds_eval
  offset: 95
  eval_length: 8
  prediction_lengths: [8]
  context_lengths: [16]
  patch_sizes: [16]

Then, I run the training command as specified:

python -m cli.train \
  -cp conf/finetune \
  run_name=store_sales_finetune \
  model=moirai_1.0_R_small \
  data=store_finetune \
  val_data=store_finetune

And it throws this error:

Error executing job with overrides: ['run_name=store_sales_finetune', 'model=moirai_1.0_R_small', 'data=store_finetune', 'val_data=store_finetune']
Error in call to target 'uni2ts.data.builder.simple.SimpleDatasetBuilder':
TypeError('expected str, bytes or os.PathLike object, not NoneType')
full_key: data

My questions:

  1. What am I missing to fix this error?
  2. How can I specify the maximum number of training steps?
  3. What is the offset parameter in the validation YAML file?

Thanks in advance for your help!

@marcopeix marcopeix added the bug Something isn't working label Jan 19, 2025
@chenghaoliu89
Copy link
Contributor

Hi @marcopeix

  1. Have you add your datapath into .env file? You can check the readme file for fine tuning guideline echo "CUSTOM_DATA_PATH=PATH_TO_SAVE" >> .env
  2. you can add the argument in your cmd or directly modify it from conf/finetune/default.yaml file by setting trainer.max_epochs
  3. date_offset (datetime string) or offset (integer) options determines the last time step of the fine-tuning train set. The validation set will be saved as DATASET_NAME_eval.

@marcopeix
Copy link
Author

marcopeix commented Jan 22, 2025

Yes, I specified the CUSTOM_DATA_PATH in an .env file at the root of the repository. When I do !cat .env, I get CUSTOM_DATA_PATH="/content/uni2ts/data", but I still get the error:

/usr/local/lib/python3.11/dist-packages/uni2ts/common/env.py:43: UserWarning: Failed to load .env file.
  warnings.warn("Failed to load .env file.")
Error executing job with overrides: ['run_name=store_sales_finetune', 'model=moirai_1.0_R_small', 'data=store_finetune', 'val_data=store_finetune']
Error in call to target 'uni2ts.data.builder.simple.generate_eval_builders':
TypeError('expected str, bytes or os.PathLike object, not NoneType')
full_key: val_data._args_

@chenghaoliu89
Copy link
Contributor

Yes, I specified the CUSTOM_DATA_PATH in an .env file at the root of the repository. When I do !cat .env, I get CUSTOM_DATA_PATH="/content/uni2ts/data", but I still get the error:

/usr/local/lib/python3.11/dist-packages/uni2ts/common/env.py:43: UserWarning: Failed to load .env file.
  warnings.warn("Failed to load .env file.")
Error executing job with overrides: ['run_name=store_sales_finetune', 'model=moirai_1.0_R_small', 'data=store_finetune', 'val_data=store_finetune']
Error in call to target 'uni2ts.data.builder.simple.generate_eval_builders':
TypeError('expected str, bytes or os.PathLike object, not NoneType')
full_key: val_data._args_

Could you print self.storage_path and self.dataset in SimpleDatasetBuilder.load_dataset when you run fine-tuning script

@chenghaoliu89 chenghaoliu89 self-assigned this Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants