Dataset `update_interactions` method #240

blondered · 2025-01-14T08:01:29Z

Feature Description

Draft functionality (to be discussed):

update_interactions method of the Dataset accepts:

interactions_df : pd.DataFrame
method: tp.Literal["add", "replace"]

The main goal is to get old dataset id maps, extend them to new users and items. Then convert new interactions to internal ids and append those new interactions to the old ones.

We have one very important thing to remember: dataset id_map always has hot users (who have interactions) before warm users (who don't have interactions but have features). New hot users should start from the last id that was relevant for the old hot users.
But previously these id belonged to warm users. So warm users will have their ids changed. So user_features array should also be changed. Row numbers in user_features correspond to internal user ids.
Same for items.

In the first PR we can implement just one of the methods.
"add" method should just append new interactions to the old ones. (duplicate user-item pairs will have multiple entries, this will result in their weights summed in user-item matrix)
"replace" method should remove old interactions.

Why this feature?

This allows for incremental training for models that support fit_partial.

Additional context

Discussed here: #176

Updating user and item features should be done in next PRs.

Maybe we should call this method update_data or smth like that to create only one method for interactions and features with optional arguments.

To be discussed

The text was updated successfully, but these errors were encountered:

blondered · 2025-01-14T08:01:51Z

@feldlime @chezou wdyt?

blondered added the enhancement New feature or request label Jan 14, 2025

blondered added this to RecTools board Jan 14, 2025

This was referenced Jan 14, 2025

fit_partial function for ImplicitALSWrapperModel and LightFMWrapperModel #176

Closed

ImplicitALSWrapperModel support for incremental training in fit_partial #242

Open

LightFMWrapperModel support for incremental training in fit_partial #243

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset `update_interactions` method #240

Dataset `update_interactions` method #240

blondered commented Jan 14, 2025

blondered commented Jan 14, 2025

Dataset update_interactions method #240

Dataset update_interactions method #240

Comments

blondered commented Jan 14, 2025

Feature Description

Why this feature?

Additional context

blondered commented Jan 14, 2025

Dataset `update_interactions` method #240

Dataset `update_interactions` method #240