From cd41f95515d297f1b958e36a6e569fedcc82a573 Mon Sep 17 00:00:00 2001 From: Miguel Fierro <3491412+miguelgfierro@users.noreply.github.com> Date: Wed, 1 May 2024 04:28:51 +0200 Subject: [PATCH 01/36] Staging to main: Fix issue with SciPy and prepare for release 1.2.0 (#2094) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Update SETUP.md Remove left-over reference to conda script that has been deleted. * remove cast to array in similarity functions Signed-off-by: miguelgfierro * remove scipy limitation Signed-off-by: miguelgfierro * flattening matrix so dataframe can be built correctly Signed-off-by: Scott Graham <5720537+gramhagen@users.noreply.github.com> * Adding toarray() suggested by @gramhagen Signed-off-by: miguelgfierro * Added r-precision Signed-off-by: David Davó * Added r-precision tests Signed-off-by: David Davó * trying to substitude geta1 with ravel Signed-off-by: miguelgfierro * Changed metric "result" to "result.toarray()" Signed-off-by: Simon Zhao * Revert recommenders/models/sar/sar_singlenode.py Signed-off-by: Simon Zhao * Use cooccurrence.toarray() Signed-off-by: Simon Zhao * return result if isinstance(result, np.ndarray) else result.toarray() Signed-off-by: Simon Zhao * Return numpy array instead of numpy matrix Signed-off-by: Simon Zhao * Prepare for Release Recommenders 1.2.0 (#2092) * New release Recommenders 1.2.0 :boom::boom: * updated news Signed-off-by: miguelgfierro --------- Signed-off-by: miguelgfierro Co-authored-by: miguelgfierro * Add reference to scenarios to README.md There is no link to the scenarios page from the front page, so it is hard to find. Added reference. --------- Signed-off-by: miguelgfierro Signed-off-by: Scott Graham <5720537+gramhagen@users.noreply.github.com> Signed-off-by: David Davó Signed-off-by: Simon Zhao Co-authored-by: Andreas Argyriou Co-authored-by: miguelgfierro Co-authored-by: Scott Graham <5720537+gramhagen@users.noreply.github.com> Co-authored-by: David Davó Co-authored-by: Simon Zhao --- NEWS.md | 9 +++ README.md | 10 ++-- SETUP.md | 2 - recommenders/__init__.py | 2 +- recommenders/evaluation/python_evaluation.py | 58 +++++++++++++++++++ recommenders/utils/python_utils.py | 12 ++-- setup.py | 2 +- .../evaluation/test_python_evaluation.py | 15 +++++ 8 files changed, 95 insertions(+), 15 deletions(-) diff --git a/NEWS.md b/NEWS.md index d417976c9..9d8b1aeb6 100644 --- a/NEWS.md +++ b/NEWS.md @@ -5,12 +5,21 @@ Licensed under the MIT License. # What's New +## Update May 2, 2024 + +We have a new release [Recommenders 1.2.0](https://github.com/microsoft/recommenders/releases/tag/1.2.0)! + +So many changes since our last release. We have full tests on Python 3.8 to 3.11 (around 1800 tests), upgraded performance in many algorithms, reviewed notebooks, and many more improvements. + + ## Update October 10, 2023 We are pleased to announce that this repository (formerly known as Microsoft Recommenders, https://github.com/microsoft/recommenders), has joined the [Linux Foundation of AI and Data](https://lfaidata.foundation/) (LF AI & Data)! The new organization, `recommenders-team`, reflects this change. We hope this move makes it easy for anyone to contribute! Our objective continues to be building an ecosystem and a community to sustain open source innovations and collaborations in recommendation systems. +Now to access the repo, instead of going to https://github.com/microsoft/recommenders, you need to go to https://github.com/recommenders-team/recommenders. The old URL will still resolve to the new one, but we recommend that you update your bookmarks. + ## Update August 18, 2023 We moved to a new organization! Now to access the repo, instead of going to https://github.com/microsoft/recommenders, you need to go to https://github.com/recommenders-team/recommenders. The old URL will still resolve to the new one, but we recommend you to update your bookmarks. diff --git a/README.md b/README.md index 89ef90ecf..35f526d1a 100644 --- a/README.md +++ b/README.md @@ -9,13 +9,11 @@ Licensed under the MIT License. -## What's New (October, 2023) +## What's New (May, 2024) -We are pleased to announce that this repository (formerly known as Microsoft Recommenders, https://github.com/microsoft/recommenders), has joined the [Linux Foundation of AI and Data](https://lfaidata.foundation/) (LF AI & Data)! The new organization, `recommenders-team`, reflects this change. +We have a new release [Recommenders 1.2.0](https://github.com/microsoft/recommenders/releases/tag/1.2.0)! -We hope this move makes it easy for anyone to contribute! Our objective continues to be building an ecosystem and a community to sustain open source innovations and collaborations in recommendation systems. - -Now to access the repo, instead of going to https://github.com/microsoft/recommenders, you need to go to https://github.com/recommenders-team/recommenders. The old URL will still resolve to the new one, but we recommend that you update your bookmarks. +So many changes since our last release. We have full tests on Python 3.8 to 3.11 (around 1800 tests), upgraded performance in many algorithms, reviewed notebooks, and many more improvements. ## Introduction @@ -35,6 +33,8 @@ Several utilities are provided in [recommenders](recommenders) to support common For a more detailed overview of the repository, please see the documents on the [wiki page](https://github.com/microsoft/recommenders/wiki/Documents-and-Presentations). +For some of the practical scenarios where recommendation systems have been applied, see [scenarios](scenarios). + ## Getting Started We recommend [conda](https://docs.conda.io/projects/conda/en/latest/glossary.html?highlight=environment#conda-environment) for environment management, and [VS Code](https://code.visualstudio.com/) for development. To install the recommenders package and run an example notebook on Linux/WSL: diff --git a/SETUP.md b/SETUP.md index f06995e6e..814118a49 100644 --- a/SETUP.md +++ b/SETUP.md @@ -150,8 +150,6 @@ Currently, tests are done on **Python CPU** (the base environment), **Python GPU Another way is to build a docker image and use the functions inside a [docker container](#setup-guide-for-docker). -Another alternative is to run all the recommender utilities directly from a local copy of the source code. This requires installing all the necessary dependencies from Anaconda and PyPI. For instructions on how to do this, see [this guide](conda.md). - ## Setup for Making a Release The process of making a new release and publishing it to [PyPI](https://pypi.org/project/recommenders/) is as follows: diff --git a/recommenders/__init__.py b/recommenders/__init__.py index e28bf197f..87998b029 100644 --- a/recommenders/__init__.py +++ b/recommenders/__init__.py @@ -2,7 +2,7 @@ # Licensed under the MIT License. __title__ = "Recommenders" -__version__ = "1.1.1" +__version__ = "1.2.0" __author__ = "Recommenders contributors" __license__ = "MIT" __copyright__ = "Copyright 2018-present Recommenders contributors." diff --git a/recommenders/evaluation/python_evaluation.py b/recommenders/evaluation/python_evaluation.py index e9adf621a..dff164ab4 100644 --- a/recommenders/evaluation/python_evaluation.py +++ b/recommenders/evaluation/python_evaluation.py @@ -541,6 +541,63 @@ def recall_at_k( return (df_hit_count["hit"] / df_hit_count["actual"]).sum() / n_users +def r_precision_at_k( + rating_true, + rating_pred, + col_user=DEFAULT_USER_COL, + col_item=DEFAULT_ITEM_COL, + col_prediction=DEFAULT_PREDICTION_COL, + relevancy_method="top_k", + k=DEFAULT_K, + threshold=DEFAULT_THRESHOLD, + **_, +): + """R-precision at K. + + R-precision can be defined as the precision@R for each user, where R is the + numer of relevant items for the query. Its also equivalent to the recall at + the R-th position. + + Note: + As R can be high, in this case, the k indicates the maximum possible R. + If every user has more than k true items, then r-precision@k is equal to + precision@k. You might need to raise the k value to get meaningful results. + + Args: + rating_true (pandas.DataFrame): True DataFrame + rating_pred (pandas.DataFrame): Predicted DataFrame + col_user (str): column name for user + col_item (str): column name for item + col_prediction (str): column name for prediction + relevancy_method (str): method for determining relevancy ['top_k', 'by_threshold', None]. None means that the + top k items are directly provided, so there is no need to compute the relevancy operation. + k (int): number of top k items per user + threshold (float): threshold of top items per user (optional) + + Returns: + float: recall at k (min=0, max=1). The maximum value is 1 even when fewer than + k items exist for a user in rating_true. + """ + df_hit, df_hit_count, n_users = merge_ranking_true_pred( + rating_true=rating_true, + rating_pred=rating_pred, + col_user=col_user, + col_item=col_item, + col_prediction=col_prediction, + relevancy_method=relevancy_method, + k=k, + threshold=threshold, + ) + + if df_hit.shape[0] == 0: + return 0.0 + + df_merged = df_hit.merge(df_hit_count[[col_user, 'actual']]) + df_merged = df_merged[df_merged['rank'] <= df_merged['actual']] + + return (df_merged.groupby(col_user).size() / df_hit_count.set_index(col_user)['actual']).mean() + + def ndcg_at_k( rating_true, rating_pred, @@ -824,6 +881,7 @@ def get_top_k_items( exp_var.__name__: exp_var, precision_at_k.__name__: precision_at_k, recall_at_k.__name__: recall_at_k, + r_precision_at_k.__name__: r_precision_at_k, ndcg_at_k.__name__: ndcg_at_k, map_at_k.__name__: map_at_k, map.__name__: map, diff --git a/recommenders/utils/python_utils.py b/recommenders/utils/python_utils.py index 6efdedfed..36fb3f815 100644 --- a/recommenders/utils/python_utils.py +++ b/recommenders/utils/python_utils.py @@ -62,7 +62,7 @@ def jaccard(cooccurrence): with np.errstate(invalid="ignore", divide="ignore"): result = cooccurrence / (diag_rows + diag_cols - cooccurrence) - return np.array(result) + return np.array(result) if isinstance(result, np.ndarray) else result.toarray() def lift(cooccurrence): @@ -85,7 +85,7 @@ def lift(cooccurrence): with np.errstate(invalid="ignore", divide="ignore"): result = cooccurrence / (diag_rows * diag_cols) - return np.array(result) + return np.array(result) if isinstance(result, np.ndarray) else result.toarray() def mutual_information(cooccurrence): @@ -106,7 +106,7 @@ def mutual_information(cooccurrence): with np.errstate(invalid="ignore", divide="ignore"): result = np.log2(cooccurrence.shape[0] * lift(cooccurrence)) - return np.array(result) + return np.array(result) if isinstance(result, np.ndarray) else result.toarray() def lexicographers_mutual_information(cooccurrence): @@ -128,7 +128,7 @@ def lexicographers_mutual_information(cooccurrence): with np.errstate(invalid="ignore", divide="ignore"): result = cooccurrence * mutual_information(cooccurrence) - return np.array(result) + return np.array(result) if isinstance(result, np.ndarray) else result.toarray() def cosine_similarity(cooccurrence): @@ -151,7 +151,7 @@ def cosine_similarity(cooccurrence): with np.errstate(invalid="ignore", divide="ignore"): result = cooccurrence / np.sqrt(diag_rows * diag_cols) - return np.array(result) + return np.array(result) if isinstance(result, np.ndarray) else result.toarray() def inclusion_index(cooccurrence): @@ -173,7 +173,7 @@ def inclusion_index(cooccurrence): with np.errstate(invalid="ignore", divide="ignore"): result = cooccurrence / np.minimum(diag_rows, diag_cols) - return np.array(result) + return np.array(result) if isinstance(result, np.ndarray) else result.toarray() def get_top_k_scored_items(scores, top_k, sort_top_k=False): diff --git a/setup.py b/setup.py index c5fc49bb8..db4e1012b 100644 --- a/setup.py +++ b/setup.py @@ -43,7 +43,7 @@ "retrying>=1.3.4,<2", "scikit-learn>=1.2.0,<2", # requires scipy, and introduce breaking change affects feature_extraction.text.TfidfVectorizer.min_df "scikit-surprise>=1.1.3", - "scipy>=1.10.1,<1.11.0", # FIXME: We limit <1.11.0 until #1954 is fixed + "scipy>=1.10.1", "seaborn>=0.13.0,<1", # requires matplotlib, packaging "transformers>=4.27.0,<5", # requires packaging, pyyaml, requests, tqdm ] diff --git a/tests/unit/recommenders/evaluation/test_python_evaluation.py b/tests/unit/recommenders/evaluation/test_python_evaluation.py index 4f0d4730b..e2f6dc149 100644 --- a/tests/unit/recommenders/evaluation/test_python_evaluation.py +++ b/tests/unit/recommenders/evaluation/test_python_evaluation.py @@ -25,6 +25,7 @@ exp_var, get_top_k_items, precision_at_k, + r_precision_at_k, recall_at_k, ndcg_at_k, map_at_k, @@ -366,6 +367,20 @@ def test_python_recall_at_k(rating_true, rating_pred, rating_nohit): assert recall_at_k(rating_true, rating_pred, k=10) == pytest.approx(0.37777, TOL) +def test_python_r_precision(rating_true, rating_pred, rating_nohit): + assert r_precision_at_k( + rating_true=rating_true, + rating_pred=rating_true, + col_prediction=DEFAULT_RATING_COL, + k=10, + ) == pytest.approx(1, TOL) + assert r_precision_at_k(rating_true, rating_nohit, k=5) == 0.0 + assert r_precision_at_k(rating_true, rating_pred, k=3) == pytest.approx(0.21111, TOL) + assert r_precision_at_k(rating_true, rating_pred, k=5) == pytest.approx(0.24444, TOL) + # Equivalent to precision + assert r_precision_at_k(rating_true, rating_pred, k=10) == pytest.approx(0.37777, TOL) + + def test_python_auc(rating_true_binary, rating_pred_binary): assert auc( rating_true=rating_true_binary, From c219da9932baf1bb0bd0172f66dd50b8721b65f3 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 13 May 2024 16:15:33 +0200 Subject: [PATCH 02/36] Badges Signed-off-by: miguelgfierro --- README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 35f526d1a..b1bf8e4e8 100644 --- a/README.md +++ b/README.md @@ -3,15 +3,20 @@ Copyright (c) Recommenders contributors. Licensed under the MIT License. --> + + # Recommenders [![Documentation status](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment) - - +[![License](https://img.shields.io/github/license/recommenders-team/recommenders.svg)](https://github.com/recommenders-team/recommenders/blob/main/LICENSE) +[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) +[![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) +[![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/lightgbm) +[![Slack](https://img.shields.io/badge/slack-join-green.svg?style=flat)](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) ## What's New (May, 2024) -We have a new release [Recommenders 1.2.0](https://github.com/microsoft/recommenders/releases/tag/1.2.0)! +We have a new release [Recommenders 1.2.0](https://github.com/recommenders-team/recommenders/releases/tag/1.2.0)! So many changes since our last release. We have full tests on Python 3.8 to 3.11 (around 1800 tests), upgraded performance in many algorithms, reviewed notebooks, and many more improvements. From de2b6410c3fd8a4a851f0379f9770be0470401f0 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 13 May 2024 16:18:31 +0200 Subject: [PATCH 03/36] Badges v2 Signed-off-by: miguelgfierro --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index b1bf8e4e8..0bebb55c2 100644 --- a/README.md +++ b/README.md @@ -3,10 +3,6 @@ Copyright (c) Recommenders contributors. Licensed under the MIT License. --> - - -# Recommenders - [![Documentation status](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment) [![License](https://img.shields.io/github/license/recommenders-team/recommenders.svg)](https://github.com/recommenders-team/recommenders/blob/main/LICENSE) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) @@ -14,6 +10,10 @@ Licensed under the MIT License. [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/lightgbm) [![Slack](https://img.shields.io/badge/slack-join-green.svg?style=flat)](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) + + +# Recommenders + ## What's New (May, 2024) We have a new release [Recommenders 1.2.0](https://github.com/recommenders-team/recommenders/releases/tag/1.2.0)! From ed52da32a0f43b59c2c407265bf8deb6d3fcbc08 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 13 May 2024 16:48:42 +0200 Subject: [PATCH 04/36] :bug: Signed-off-by: miguelgfierro --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0bebb55c2..8aca6107f 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ Licensed under the MIT License. [![License](https://img.shields.io/github/license/recommenders-team/recommenders.svg)](https://github.com/recommenders-team/recommenders/blob/main/LICENSE) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) -[![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/lightgbm) +[![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) [![Slack](https://img.shields.io/badge/slack-join-green.svg?style=flat)](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) From bf7d393cf7cd42e2030afecb5c206778a0422a13 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 13 May 2024 17:02:13 +0200 Subject: [PATCH 05/36] join slack Signed-off-by: miguelgfierro --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 8aca6107f..164beac77 100644 --- a/README.md +++ b/README.md @@ -2,17 +2,17 @@ Copyright (c) Recommenders contributors. Licensed under the MIT License. --> + + +# Recommenders [![Documentation status](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment) [![License](https://img.shields.io/github/license/recommenders-team/recommenders.svg)](https://github.com/recommenders-team/recommenders/blob/main/LICENSE) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) -[![Slack](https://img.shields.io/badge/slack-join-green.svg?style=flat)](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) +[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) - - -# Recommenders ## What's New (May, 2024) From 602486a69f73956df286c1f481fb7784d3a34eb1 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 14 May 2024 12:47:03 +0200 Subject: [PATCH 06/36] move slack badge down Signed-off-by: miguelgfierro --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 164beac77..ad83d8954 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,8 @@ Licensed under the MIT License. [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) -[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) + +[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) ## What's New (May, 2024) From f2ce1f8b2fef6a24773f5ad1c75ae3804a324842 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 14 May 2024 12:48:24 +0200 Subject: [PATCH 07/36] move slack badge up Signed-off-by: miguelgfierro --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index ad83d8954..6f706b800 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ Licensed under the MIT License. # Recommenders +[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) [![Documentation status](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment) [![License](https://img.shields.io/github/license/recommenders-team/recommenders.svg)](https://github.com/recommenders-team/recommenders/blob/main/LICENSE) @@ -12,8 +13,6 @@ Licensed under the MIT License. [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) -[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) - ## What's New (May, 2024) From 98f9c4d30527bc4ee304baec213184cfd49265ee Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 14 May 2024 12:49:19 +0200 Subject: [PATCH 08/36] move slack badge up up Signed-off-by: miguelgfierro --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6f706b800..1f64055a1 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,10 @@ Licensed under the MIT License. --> -# Recommenders [](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) +# Recommenders + [![Documentation status](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment) [![License](https://img.shields.io/github/license/recommenders-team/recommenders.svg)](https://github.com/recommenders-team/recommenders/blob/main/LICENSE) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) From 05bf12a4a91fe67bb1f2046dc05adfca2ec8c1fa Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 14 May 2024 12:51:59 +0200 Subject: [PATCH 09/36] move slack badge down with line Signed-off-by: miguelgfierro --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 1f64055a1..40726ea96 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,6 @@ Licensed under the MIT License. --> -[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) # Recommenders @@ -14,6 +13,8 @@ Licensed under the MIT License. [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) +[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) +
## What's New (May, 2024) From 93e97a14a3e65723ff7bae3712708aa87897315e Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 14 May 2024 12:52:25 +0200 Subject: [PATCH 10/36] move slack badge down with line Signed-off-by: miguelgfierro --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 40726ea96..9ca5d369a 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Licensed under the MIT License. [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) -[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) +[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F)
## What's New (May, 2024) From 15d633b08c8f26b056ef44aeb3dd4030093e30b1 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 14 May 2024 12:58:16 +0200 Subject: [PATCH 11/36] Trying without Recommenders Signed-off-by: miguelgfierro --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 9ca5d369a..81e77733c 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,6 @@ Licensed under the MIT License. -# Recommenders - [![Documentation status](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/recommenders-team/recommenders/actions/workflows/pages/pages-build-deployment) [![License](https://img.shields.io/github/license/recommenders-team/recommenders.svg)](https://github.com/recommenders-team/recommenders/blob/main/LICENSE) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) From dc00ae4986273fd2e2f9905f863a908f342ed07e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Fri, 17 May 2024 09:20:00 +0200 Subject: [PATCH 12/36] =?UTF-8?q?Add=20David=20Dav=C3=B3=20as=20contributo?= =?UTF-8?q?r=20#2099?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: David Davó --- AUTHORS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/AUTHORS.md b/AUTHORS.md index 54664fe0c..d3bc2c3d6 100644 --- a/AUTHORS.md +++ b/AUTHORS.md @@ -72,6 +72,8 @@ To contributors: please add your name to the list when you submit a patch to the * **[Dan Ciborowski](https://github.com/dciborow)** * ALS operationalization notebook * SAR PySpark improvement +* **[David Davó](https://github.com/daviddavo)** + * Added R-Precision metric * **[Daniel Schneider](https://github.com/danielsc)** * FastAI notebook * **[Evgenia Chroni](https://github.com/EvgeniaChroni)** From df667a50d2254d1520ea5946b70c320032fb9399 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 17 May 2024 11:17:18 +0200 Subject: [PATCH 13/36] Slack CTA Signed-off-by: miguelgfierro --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 81e77733c..c1aee3416 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ Licensed under the MIT License. [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) -[](https://lfaifoundation.slack.com/archives/C06D2GQ9K8F) +[](https://join.slack.com/t/lfaifoundation/shared_invite/zt-2iyl7zyya-g5rOO5K518CBoevyi28W6w)
## What's New (May, 2024) From 9659dcd54a4ab895ede26e9d75c3d02c6e7ff153 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 17 May 2024 11:18:21 +0200 Subject: [PATCH 14/36] width Signed-off-by: miguelgfierro --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c1aee3416..df5c67190 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ Licensed under the MIT License. [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) -[](https://join.slack.com/t/lfaifoundation/shared_invite/zt-2iyl7zyya-g5rOO5K518CBoevyi28W6w) +[](https://join.slack.com/t/lfaifoundation/shared_invite/zt-2iyl7zyya-g5rOO5K518CBoevyi28W6w)
## What's New (May, 2024) From 0f7ad9eb33d82d0b65582d10b326cff846cdd6ce Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Fri, 17 May 2024 11:18:57 +0200 Subject: [PATCH 15/36] width 300 Signed-off-by: miguelgfierro --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index df5c67190..8b0a3015d 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,8 @@ Licensed under the MIT License. [![PyPI Version](https://img.shields.io/pypi/v/recommenders.svg?logo=pypi&logoColor=white)](https://pypi.org/project/recommenders) [![Python Versions](https://img.shields.io/pypi/pyversions/recommenders.svg?logo=python&logoColor=white)](https://pypi.org/project/recommenders) -[](https://join.slack.com/t/lfaifoundation/shared_invite/zt-2iyl7zyya-g5rOO5K518CBoevyi28W6w) +[](https://join.slack.com/t/lfaifoundation/shared_invite/zt-2iyl7zyya-g5rOO5K518CBoevyi28W6w) +
## What's New (May, 2024) From 7da391b8f054cb249a2cb6c95dbc83587b17e8ee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Fri, 17 May 2024 17:30:43 +0200 Subject: [PATCH 16/36] Fixed contributors order MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: David Davó --- AUTHORS.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/AUTHORS.md b/AUTHORS.md index d3bc2c3d6..1816f73e2 100644 --- a/AUTHORS.md +++ b/AUTHORS.md @@ -72,10 +72,10 @@ To contributors: please add your name to the list when you submit a patch to the * **[Dan Ciborowski](https://github.com/dciborow)** * ALS operationalization notebook * SAR PySpark improvement -* **[David Davó](https://github.com/daviddavo)** - * Added R-Precision metric * **[Daniel Schneider](https://github.com/danielsc)** * FastAI notebook +* **[David Davó](https://github.com/daviddavo)** + * Added R-Precision metric * **[Evgenia Chroni](https://github.com/EvgeniaChroni)** * Multinomial VAE algorithm * Standard VAE algorithm From af9a035362dd7424bbcab2e29113875c1bf69d37 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:18:29 +0200 Subject: [PATCH 17/36] Ideas for contributions Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 217c1c900..298f8560c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -10,7 +10,6 @@ Contributions are welcomed! Here's a few things to know: - [Contribution Guidelines](#contribution-guidelines) - [Steps to Contributing](#steps-to-contributing) - [Coding Guidelines](#coding-guidelines) - - [Microsoft Contributor License Agreement](#microsoft-contributor-license-agreement) - [Code of Conduct](#code-of-conduct) - [Do not point fingers](#do-not-point-fingers) - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) @@ -33,6 +32,42 @@ Here are the basic steps to get started with your first contribution. Please rea See the wiki for more details about our [merging strategy](https://github.com/microsoft/recommenders/wiki/Strategy-to-merge-the-code-to-main-branch). +## Ideas for Contributions + +### A first contribution + +For people who are new to open source or to Recommenders, a good way to start is by contribution with documentation. You can help with any of the README files or in the notebooks. + +### Datasets + +To contribute new datasets, please consider this: + +* Minimize dependencies, it's better to use `requests` library than a custom library. +* Make sure that the dataset is publicly available and that the license allows for redistribution. + +### Models + +To contribute new models, please consider this: + +* Please don't add models that are already implemented in the repo. An exception to this rule is if you are adding a more optimal implementation or you want to migrate a model from TensorFlow to PyTorch. +* Prioritize the minimal code necessary instead of adding a full library. If you add code from another repository, please make sure to follow the license and give proper credit. +* All models should be accompanied by a notebook that shows how to use the model and how to train it. The notebook should be in the [examples](examples) folder. +* The model should be tested with unit tests, and the notebooks should be tested with functional tests. + +### Metrics + +To contribute new metrics, please consider this: + +* A good way to contribute with metrics is by optimizing the code of the existing ones. +* If you are adding a new metric, please consider adding not only a CPU version, but also a PySpark version. + +### General tips + +* Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. +* Prioritize PyTorch over TensorFlow. +* Avoid GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. + + ## Coding Guidelines We strive to maintain high quality code to make the utilities in the repository easy to understand, use, and extend. We also work hard to maintain a friendly and constructive environment. We've found that having clear expectations on the development process and consistent style helps to ensure everyone can contribute and collaborate effectively. From acbaf0fe8d079205fca6de5398c1e81d39f804a6 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:22:56 +0200 Subject: [PATCH 18/36] Ideas for contributions :memo: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 298f8560c..91a9be162 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -9,11 +9,17 @@ Contributions are welcomed! Here's a few things to know: - [Contribution Guidelines](#contribution-guidelines) - [Steps to Contributing](#steps-to-contributing) + - [Ideas for Contributions](#ideas-for-contributions) + - [A first contribution](#a-first-contribution) + - [Datasets](#datasets) + - [Models](#models) + - [Metrics](#metrics) + - [General tips](#general-tips) - [Coding Guidelines](#coding-guidelines) - [Code of Conduct](#code-of-conduct) - - [Do not point fingers](#do-not-point-fingers) - - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) - - [Ask questions do not give answers](#ask-questions-do-not-give-answers) + - [Do not point fingers](#do-not-point-fingers) + - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) + - [Ask questions do not give answers](#ask-questions-do-not-give-answers) ## Steps to Contributing @@ -38,6 +44,8 @@ See the wiki for more details about our [merging strategy](https://github.com/mi For people who are new to open source or to Recommenders, a good way to start is by contribution with documentation. You can help with any of the README files or in the notebooks. +For more advanced users, consider fixing one of the bugs listed in the issues. + ### Datasets To contribute new datasets, please consider this: @@ -65,8 +73,7 @@ To contribute new metrics, please consider this: * Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. * Prioritize PyTorch over TensorFlow. -* Avoid GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. - +* Avoid adding code with GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. ## Coding Guidelines @@ -74,9 +81,11 @@ We strive to maintain high quality code to make the utilities in the repository Please review the [coding guidelines](https://github.com/recommenders-team/recommenders/wiki/Coding-Guidelines) wiki page to see more details about the expectations for development approach and style. +## Code of Conduct + Apart from the official [Code of Conduct](CODE_OF_CONDUCT.md), in Recommenders team we adopt the following behaviors, to ensure a great working environment: -#### Do not point fingers +### Do not point fingers Let’s be constructive.
@@ -86,7 +95,7 @@ Let’s be constructive.
-#### Provide code feedback based on evidence +### Provide code feedback based on evidence When making code reviews, try to support your ideas based on evidence (papers, library documentation, stackoverflow, etc) rather than your personal preferences. @@ -97,7 +106,7 @@ When making code reviews, try to support your ideas based on evidence (papers, l -#### Ask questions do not give answers +### Ask questions do not give answers Try to be empathic.
From 385746c148a30cb40010c2e07d4e52ca6c1e7c5a Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:24:43 +0200 Subject: [PATCH 19/36] :bug: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 91a9be162..110108bdd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -7,19 +7,18 @@ Licensed under the MIT License. Contributions are welcomed! Here's a few things to know: -- [Contribution Guidelines](#contribution-guidelines) - - [Steps to Contributing](#steps-to-contributing) - - [Ideas for Contributions](#ideas-for-contributions) - - [A first contribution](#a-first-contribution) - - [Datasets](#datasets) - - [Models](#models) - - [Metrics](#metrics) - - [General tips](#general-tips) - - [Coding Guidelines](#coding-guidelines) - - [Code of Conduct](#code-of-conduct) - - [Do not point fingers](#do-not-point-fingers) - - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) - - [Ask questions do not give answers](#ask-questions-do-not-give-answers) +- [Steps to Contributing](#steps-to-contributing) +- [Ideas for Contributions](#ideas-for-contributions) + - [A first contribution](#a-first-contribution) + - [Datasets](#datasets) + - [Models](#models) + - [Metrics](#metrics) + - [General tips](#general-tips) +- [Coding Guidelines](#coding-guidelines) +- [Code of Conduct](#code-of-conduct) + - [Do not point fingers](#do-not-point-fingers) + - [Provide code feedback based on evidence](#provide-code-feedback-based-on-evidence) + - [Ask questions do not give answers](#ask-questions-do-not-give-answers) ## Steps to Contributing From b43ea7bb131a7f27c2856bf618a71bf9d56aae6a Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 00:34:54 +0200 Subject: [PATCH 20/36] :memo: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 110108bdd..e2bb4c576 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -67,6 +67,7 @@ To contribute new metrics, please consider this: * A good way to contribute with metrics is by optimizing the code of the existing ones. * If you are adding a new metric, please consider adding not only a CPU version, but also a PySpark version. +* When adding the tests, make sure you check for the limits. For example, if you add an error metric, check that the error between two identical datasets is zero. ### General tips @@ -78,7 +79,7 @@ To contribute new metrics, please consider this: We strive to maintain high quality code to make the utilities in the repository easy to understand, use, and extend. We also work hard to maintain a friendly and constructive environment. We've found that having clear expectations on the development process and consistent style helps to ensure everyone can contribute and collaborate effectively. -Please review the [coding guidelines](https://github.com/recommenders-team/recommenders/wiki/Coding-Guidelines) wiki page to see more details about the expectations for development approach and style. +Please review the [Coding Guidelines](https://github.com/recommenders-team/recommenders/wiki/Coding-Guidelines) wiki page to see more details about the expectations for development approach and style. ## Code of Conduct @@ -101,7 +102,7 @@ When making code reviews, try to support your ideas based on evidence (papers, l
Click here to see some examples -"When reviewing this code, I saw that the Python implementation the metrics are based on classes, however, [scikit-learn](https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics) and [tensorflow](https://www.tensorflow.org/api_docs/python/tf/metrics) use functions. We should follow the standard in the industry." +"When reviewing this code, I saw that the Python implementation of the metrics are based on classes, however, [scikit-learn](https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics) use functions. We should follow the standard in the industry."
From 87546e6bbb4cdf805ecd130edb27e805284b0f64 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Tue, 21 May 2024 11:51:55 +0200 Subject: [PATCH 21/36] Feedback @anargyri Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e2bb4c576..9d3d8f4a3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -71,9 +71,10 @@ To contribute new metrics, please consider this: ### General tips -* Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. * Prioritize PyTorch over TensorFlow. +* Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. * Avoid adding code with GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. +* Add the copyright statement at the beginning of the file: `Copyright (c) Recommenders contributors. Licensed under the MIT License.` ## Coding Guidelines From 0976df459db26f817306910f85eb3e69109ce207 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Wed, 22 May 2024 10:37:34 +0200 Subject: [PATCH 22/36] :memo: Signed-off-by: miguelgfierro --- CONTRIBUTING.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9d3d8f4a3..4d25db4e3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -73,7 +73,7 @@ To contribute new metrics, please consider this: * Prioritize PyTorch over TensorFlow. * Minimize dependencies. Around 80% of the issues in the repo are related to dependencies. -* Avoid adding code with GPL and other viral licenses. Prioritize MIT, Apache, and other permissive licenses. +* Avoid adding code with GPL and other copyleft licenses. Prioritize MIT, Apache, and other permissive licenses. * Add the copyright statement at the beginning of the file: `Copyright (c) Recommenders contributors. Licensed under the MIT License.` ## Coding Guidelines From 1da9670ed778ad756755e77006fd30123f4abd4f Mon Sep 17 00:00:00 2001 From: Martin Date: Sun, 26 May 2024 13:53:23 +0800 Subject: [PATCH 23/36] Update cornac_bivae_deep_dive.ipynb: fix typos --- .../cornac_bivae_deep_dive.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb b/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb index 731ab0c12..fb432ccca 100644 --- a/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb +++ b/examples/02_model_collaborative_filtering/cornac_bivae_deep_dive.ipynb @@ -610,7 +610,7 @@ "source": [ "## 4 Discussion\n", "\n", - "BiVAE is a new variational autoencoder tailored for dyadic data, where observations consist of measurements associated with two sets of objects, e.g., users, items and corresponding ratings. The model is symmetric, which makes it easier to extend axiliary data from both sides of users and items. In addition to preference data, the model can be applied to other types of dyadic data such as documentword matrices, and other tasks such as co-clustering. \n", + "BiVAE is a new variational autoencoder tailored for dyadic data, where observations consist of measurements associated with two sets of objects, e.g., users, items and corresponding ratings. The model is symmetric, which makes it easier to extend auxiliary data from both sides of users and items. In addition to preference data, the model can be applied to other types of dyadic data such as document-word matrices, and other tasks such as co-clustering. \n", "\n", "In the paper, there is also a discussion on Constrained Adaptive Priors (CAP), a proposed method to build informative priors to mitigate the well-known posterior collapse problem. We have left out that part purposely, not to distract the audiences. Nevertheless, it is very interesting and worth taking a look. \n", "\n", From a92b31a644c91550e696a0a800073e8689de3ea7 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Mon, 3 Jun 2024 16:58:06 +0200 Subject: [PATCH 24/36] breaking change in sklearn in log_loss :boom::boom: Signed-off-by: miguelgfierro --- examples/00_quick_start/lightgbm_tinycriteo.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/examples/00_quick_start/lightgbm_tinycriteo.ipynb b/examples/00_quick_start/lightgbm_tinycriteo.ipynb index f7a786415..ffd827eac 100644 --- a/examples/00_quick_start/lightgbm_tinycriteo.ipynb +++ b/examples/00_quick_start/lightgbm_tinycriteo.ipynb @@ -717,7 +717,7 @@ "source": [ "test_preds = lgb_model.predict(test_x)\n", "auc = roc_auc_score(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", - "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds), eps=1e-12)\n", + "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", "res_basic = {\"auc\": auc, \"logloss\": logloss}\n", "print(res_basic)\n" ] @@ -904,7 +904,7 @@ ], "source": [ "auc = roc_auc_score(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", - "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds), eps=1e-12)\n", + "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", "res_optim = {\"auc\": auc, \"logloss\": logloss}\n", "\n", "print(res_optim)" @@ -959,7 +959,7 @@ ], "source": [ "auc = roc_auc_score(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", - "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds), eps=1e-12)\n", + "logloss = log_loss(np.asarray(test_y.reshape(-1)), np.asarray(test_preds))\n", "\n", "print({\"auc\": auc, \"logloss\": logloss})" ] From 8dfbaf09a12abe5129bc41ce22f3d0739d19c1fe Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Sat, 8 Jun 2024 09:28:08 +0200 Subject: [PATCH 25/36] Update jupyter dep to accomodate Colab Signed-off-by: miguelgfierro --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index db4e1012b..87923cf00 100644 --- a/setup.py +++ b/setup.py @@ -35,7 +35,7 @@ "locust>=2.12.2,<3", # requires jinja2 "memory-profiler>=0.61.0,<1", "nltk>=3.8.1,<4", # requires tqdm - "notebook>=7.0.0,<8", # requires ipykernel, jinja2, jupyter, nbconvert, nbformat, packaging, requests + "notebook>=6.5.5,<8", # requires ipykernel, jinja2, jupyter, nbconvert, nbformat, packaging, requests "numba>=0.57.0,<1", "pandas>2.0.0,<3.0.0", # requires numpy "pandera[strategies]>=0.6.5,<0.18;python_version<='3.8'", # For generating fake datasets From 882777281c991b3be001098b8a91d601b5aadd0e Mon Sep 17 00:00:00 2001 From: Kingston257 <63620204+Kingston257@users.noreply.github.com> Date: Mon, 10 Jun 2024 13:15:14 +0100 Subject: [PATCH 26/36] Fix typo in README.md Made the following grammatical corrections to line 97 of the file: - "has into account" should be "takes into account". - "Riemannian conjugate gradients optimization" should be "Riemannian conjugate gradient optimization" (singular form). - "and following a geometric approach" should be "and follows a geometric approach". --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8b0a3015d..74805200d 100644 --- a/README.md +++ b/README.md @@ -94,7 +94,7 @@ The table below lists the recommendation algorithms currently available in the r | LightFM/Factorization Machine | Collaborative Filtering | Factorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment. | [Quick start](examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb) | | LightGBM/Gradient Boosting Tree* | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments. | [Quick start in CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [Deep dive in PySpark](examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb) | | LightGCN | Collaborative Filtering | Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb) | -| GeoIMC* | Collaborative Filtering | Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. It works in the CPU environment. | [Quick start](examples/00_quick_start/geoimc_movielens.ipynb) | +| GeoIMC* | Collaborative Filtering | Matrix completion algorithm that takes into account user and item features using Riemannian conjugate gradient optimization and follows a geometric approach. It works in the CPU environment. | [Quick start](examples/00_quick_start/geoimc_movielens.ipynb) | | GRU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | | Multinomial VAE | Collaborative Filtering | Generative model for predicting user/item interactions. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/multi_vae_deep_dive.ipynb) | | Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | Content-Based Filtering | Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/lstur_MIND.ipynb) | From e18bd6d7e05ba4cff36a570cac926b3f470f4526 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Mon, 24 Jun 2024 09:25:30 +0000 Subject: [PATCH 27/36] Removed deprecated numpy alias MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: David Davó --- recommenders/datasets/pandas_df_utils.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recommenders/datasets/pandas_df_utils.py b/recommenders/datasets/pandas_df_utils.py index 50bd83dd8..f9711ce24 100644 --- a/recommenders/datasets/pandas_df_utils.py +++ b/recommenders/datasets/pandas_df_utils.py @@ -163,7 +163,7 @@ def fit(self, df, col_rating=DEFAULT_RATING_COL): types = df.dtypes if not all( [ - x == object or np.issubdtype(x, np.integer) or x == np.float + x == object or np.issubdtype(x, np.integer) or x == float for x in types ] ): From b73ee016cf207794a210f2091831a8c8c8a257c1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Mon, 24 Jun 2024 09:56:15 +0000 Subject: [PATCH 28/36] Deprecated use of dict in SeriesGroupBy.agg MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: David Davó --- recommenders/evaluation/python_evaluation.py | 24 ++++++++++---------- recommenders/evaluation/spark_evaluation.py | 4 ++-- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/recommenders/evaluation/python_evaluation.py b/recommenders/evaluation/python_evaluation.py index dff164ab4..7329f049c 100644 --- a/recommenders/evaluation/python_evaluation.py +++ b/recommenders/evaluation/python_evaluation.py @@ -435,9 +435,9 @@ def merge_ranking_true_pred( # count the number of hits vs actual relevant items per user df_hit_count = pd.merge( - df_hit.groupby(col_user, as_index=False)[col_user].agg({"hit": "count"}), + df_hit.groupby(col_user, as_index=False)[col_user].agg(hit="count"), rating_true_common.groupby(col_user, as_index=False)[col_user].agg( - {"actual": "count"} + actual="count", ), on=col_user, ) @@ -680,14 +680,14 @@ def ndcg_at_k( df_idcg["idcg"] = df_idcg["rel"] / discfun(1 + df_idcg["irank"]) # Calculate the actual DCG for each user - df_user = df_dcg.groupby(col_user, as_index=False, sort=False).agg({"dcg": "sum"}) + df_user = df_dcg.groupby(col_user, as_index=False, sort=False).agg(dcg="sum") # Calculate the ideal DCG for each user df_user = df_user.merge( df_idcg.groupby(col_user, as_index=False, sort=False) .head(k) .groupby(col_user, as_index=False, sort=False) - .agg({"idcg": "sum"}), + .agg(idcg="sum"), on=col_user, ) @@ -726,7 +726,7 @@ def _get_reciprocal_rank( df_hit_sorted["rr"] = ( df_hit_sorted.groupby(col_user).cumcount() + 1 ) / df_hit_sorted["rank"] - df_hit_sorted = df_hit_sorted.groupby(col_user).agg({"rr": "sum"}).reset_index() + df_hit_sorted = df_hit_sorted.groupby(col_user).agg(rr="sum").reset_index() return pd.merge(df_hit_sorted, df_hit_count, on=col_user), n_users @@ -1235,7 +1235,7 @@ def _get_intralist_similarity( item_pair_sim["i1"] != item_pair_sim["i2"] ].reset_index(drop=True) df_intralist_similarity = ( - item_pair_sim.groupby([col_user]).agg({col_sim: "mean"}).reset_index() + item_pair_sim.groupby([col_user]).agg(**{col_sim: "mean"}).reset_index() ) df_intralist_similarity.columns = [col_user, "avg_il_sim"] @@ -1345,7 +1345,7 @@ def diversity( col_item, col_sim, ) - avg_diversity = df_user_diversity.agg({"user_diversity": "mean"})[0] + avg_diversity = df_user_diversity.agg(user_diversity="mean")[0] return avg_diversity @@ -1432,7 +1432,7 @@ def novelty(train_df, reco_df, col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_ reco_item_novelty["product"] = ( reco_item_novelty["count"] * reco_item_novelty["item_novelty"] ) - avg_novelty = reco_item_novelty.agg({"product": "sum"})[0] / n_recommendations + avg_novelty = reco_item_novelty.agg(product="sum")[0] / n_recommendations return avg_novelty @@ -1512,7 +1512,7 @@ def user_item_serendipity( reco_user_item_avg_sim = ( reco_train_user_item_sim.groupby([col_user, col_item]) - .agg({col_sim: "mean"}) + .agg(**{col_sim: "mean"}) .reset_index() ) reco_user_item_avg_sim.columns = [ @@ -1582,7 +1582,7 @@ def user_serendipity( ) df_user_serendipity = ( df_user_item_serendipity.groupby(col_user) - .agg({"user_item_serendipity": "mean"}) + .agg(user_item_serendipity="mean") .reset_index() ) df_user_serendipity.columns = [col_user, "user_serendipity"] @@ -1636,7 +1636,7 @@ def serendipity( col_sim, col_relevance, ) - avg_serendipity = df_user_serendipity.agg({"user_serendipity": "mean"})[0] + avg_serendipity = df_user_serendipity.agg(user_serendipity="mean")[0] return avg_serendipity @@ -1711,6 +1711,6 @@ def distributional_coverage( df_entropy["p(i)"] = df_entropy["count"] / count_row_reco df_entropy["entropy(i)"] = df_entropy["p(i)"] * np.log2(df_entropy["p(i)"]) - d_coverage = -df_entropy.agg({"entropy(i)": "sum"})[0] + d_coverage = -df_entropy.agg(**{"entropy(i)": "sum"})[0] return d_coverage diff --git a/recommenders/evaluation/spark_evaluation.py b/recommenders/evaluation/spark_evaluation.py index 2e376edc2..97e6e9e54 100644 --- a/recommenders/evaluation/spark_evaluation.py +++ b/recommenders/evaluation/spark_evaluation.py @@ -761,7 +761,7 @@ def diversity(self): if self.avg_diversity is None: self.df_user_diversity = self.user_diversity() self.avg_diversity = self.df_user_diversity.agg( - {"user_diversity": "mean"} + user_diversity="mean" ).first()[0] return self.avg_diversity @@ -904,7 +904,7 @@ def serendipity(self): if self.avg_serendipity is None: self.df_user_serendipity = self.user_serendipity() self.avg_serendipity = self.df_user_serendipity.agg( - {"user_serendipity": "mean"} + user_serendipity="mean" ).first()[0] return self.avg_serendipity From cb2b282ad429ab210edcae65d059f3af3072edfe Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Mon, 24 Jun 2024 14:56:38 +0000 Subject: [PATCH 29/36] Support for cold-start in ImplicitCF --- recommenders/models/deeprec/DataModel/ImplicitCF.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/recommenders/models/deeprec/DataModel/ImplicitCF.py b/recommenders/models/deeprec/DataModel/ImplicitCF.py index f490c48f3..3cfbb2821 100644 --- a/recommenders/models/deeprec/DataModel/ImplicitCF.py +++ b/recommenders/models/deeprec/DataModel/ImplicitCF.py @@ -80,6 +80,7 @@ def _data_processing(self, train, test): user_idx = df[[self.col_user]].drop_duplicates().reindex() user_idx[self.col_user + "_idx"] = np.arange(len(user_idx)) self.n_users = len(user_idx) + self.n_users_in_train = train[self.col_user].nunique() self.user_idx = user_idx self.user2id = dict( @@ -210,7 +211,7 @@ def sample_neg(x): if neg_id not in x: return neg_id - indices = range(self.n_users) + indices = range(self.n_users_in_train) if self.n_users < batch_size: users = [random.choice(indices) for _ in range(batch_size)] else: From 0a1b98f796e39d085defbfc0a289340e5076c45f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Wed, 26 Jun 2024 12:05:55 +0200 Subject: [PATCH 30/36] Revert "Deprecated use of dict in SeriesGroupBy.agg" This reverts commit b73ee016cf207794a210f2091831a8c8c8a257c1. --- recommenders/evaluation/python_evaluation.py | 24 ++++++++++---------- recommenders/evaluation/spark_evaluation.py | 4 ++-- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/recommenders/evaluation/python_evaluation.py b/recommenders/evaluation/python_evaluation.py index 7329f049c..dff164ab4 100644 --- a/recommenders/evaluation/python_evaluation.py +++ b/recommenders/evaluation/python_evaluation.py @@ -435,9 +435,9 @@ def merge_ranking_true_pred( # count the number of hits vs actual relevant items per user df_hit_count = pd.merge( - df_hit.groupby(col_user, as_index=False)[col_user].agg(hit="count"), + df_hit.groupby(col_user, as_index=False)[col_user].agg({"hit": "count"}), rating_true_common.groupby(col_user, as_index=False)[col_user].agg( - actual="count", + {"actual": "count"} ), on=col_user, ) @@ -680,14 +680,14 @@ def ndcg_at_k( df_idcg["idcg"] = df_idcg["rel"] / discfun(1 + df_idcg["irank"]) # Calculate the actual DCG for each user - df_user = df_dcg.groupby(col_user, as_index=False, sort=False).agg(dcg="sum") + df_user = df_dcg.groupby(col_user, as_index=False, sort=False).agg({"dcg": "sum"}) # Calculate the ideal DCG for each user df_user = df_user.merge( df_idcg.groupby(col_user, as_index=False, sort=False) .head(k) .groupby(col_user, as_index=False, sort=False) - .agg(idcg="sum"), + .agg({"idcg": "sum"}), on=col_user, ) @@ -726,7 +726,7 @@ def _get_reciprocal_rank( df_hit_sorted["rr"] = ( df_hit_sorted.groupby(col_user).cumcount() + 1 ) / df_hit_sorted["rank"] - df_hit_sorted = df_hit_sorted.groupby(col_user).agg(rr="sum").reset_index() + df_hit_sorted = df_hit_sorted.groupby(col_user).agg({"rr": "sum"}).reset_index() return pd.merge(df_hit_sorted, df_hit_count, on=col_user), n_users @@ -1235,7 +1235,7 @@ def _get_intralist_similarity( item_pair_sim["i1"] != item_pair_sim["i2"] ].reset_index(drop=True) df_intralist_similarity = ( - item_pair_sim.groupby([col_user]).agg(**{col_sim: "mean"}).reset_index() + item_pair_sim.groupby([col_user]).agg({col_sim: "mean"}).reset_index() ) df_intralist_similarity.columns = [col_user, "avg_il_sim"] @@ -1345,7 +1345,7 @@ def diversity( col_item, col_sim, ) - avg_diversity = df_user_diversity.agg(user_diversity="mean")[0] + avg_diversity = df_user_diversity.agg({"user_diversity": "mean"})[0] return avg_diversity @@ -1432,7 +1432,7 @@ def novelty(train_df, reco_df, col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_ reco_item_novelty["product"] = ( reco_item_novelty["count"] * reco_item_novelty["item_novelty"] ) - avg_novelty = reco_item_novelty.agg(product="sum")[0] / n_recommendations + avg_novelty = reco_item_novelty.agg({"product": "sum"})[0] / n_recommendations return avg_novelty @@ -1512,7 +1512,7 @@ def user_item_serendipity( reco_user_item_avg_sim = ( reco_train_user_item_sim.groupby([col_user, col_item]) - .agg(**{col_sim: "mean"}) + .agg({col_sim: "mean"}) .reset_index() ) reco_user_item_avg_sim.columns = [ @@ -1582,7 +1582,7 @@ def user_serendipity( ) df_user_serendipity = ( df_user_item_serendipity.groupby(col_user) - .agg(user_item_serendipity="mean") + .agg({"user_item_serendipity": "mean"}) .reset_index() ) df_user_serendipity.columns = [col_user, "user_serendipity"] @@ -1636,7 +1636,7 @@ def serendipity( col_sim, col_relevance, ) - avg_serendipity = df_user_serendipity.agg(user_serendipity="mean")[0] + avg_serendipity = df_user_serendipity.agg({"user_serendipity": "mean"})[0] return avg_serendipity @@ -1711,6 +1711,6 @@ def distributional_coverage( df_entropy["p(i)"] = df_entropy["count"] / count_row_reco df_entropy["entropy(i)"] = df_entropy["p(i)"] * np.log2(df_entropy["p(i)"]) - d_coverage = -df_entropy.agg(**{"entropy(i)": "sum"})[0] + d_coverage = -df_entropy.agg({"entropy(i)": "sum"})[0] return d_coverage diff --git a/recommenders/evaluation/spark_evaluation.py b/recommenders/evaluation/spark_evaluation.py index 97e6e9e54..2e376edc2 100644 --- a/recommenders/evaluation/spark_evaluation.py +++ b/recommenders/evaluation/spark_evaluation.py @@ -761,7 +761,7 @@ def diversity(self): if self.avg_diversity is None: self.df_user_diversity = self.user_diversity() self.avg_diversity = self.df_user_diversity.agg( - user_diversity="mean" + {"user_diversity": "mean"} ).first()[0] return self.avg_diversity @@ -904,7 +904,7 @@ def serendipity(self): if self.avg_serendipity is None: self.df_user_serendipity = self.user_serendipity() self.avg_serendipity = self.df_user_serendipity.agg( - user_serendipity="mean" + {"user_serendipity": "mean"} ).first()[0] return self.avg_serendipity From 66ace3e6e273fd456eb65693e8ee96c6da1a0361 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Wed, 26 Jun 2024 12:09:03 +0200 Subject: [PATCH 31/36] Fix python evaluation It turns out I ran incorrect tests and the evaluation module was not really working --- recommenders/evaluation/python_evaluation.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/recommenders/evaluation/python_evaluation.py b/recommenders/evaluation/python_evaluation.py index dff164ab4..9c7b9115f 100644 --- a/recommenders/evaluation/python_evaluation.py +++ b/recommenders/evaluation/python_evaluation.py @@ -435,9 +435,9 @@ def merge_ranking_true_pred( # count the number of hits vs actual relevant items per user df_hit_count = pd.merge( - df_hit.groupby(col_user, as_index=False)[col_user].agg({"hit": "count"}), + df_hit.groupby(col_user, as_index=False)[col_user].agg(hit="count"), rating_true_common.groupby(col_user, as_index=False)[col_user].agg( - {"actual": "count"} + actual="count", ), on=col_user, ) From 8cd3c133d7d167737b33f09152f0eca20b7ab170 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Dav=C3=B3?= Date: Wed, 26 Jun 2024 12:13:11 +0200 Subject: [PATCH 32/36] Restrict numpy version MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: David Davó --- pyproject.toml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index 0ff4c8d96..9d73fa1d6 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -2,12 +2,12 @@ requires = [ "setuptools>=52", "wheel>=0.36", - "numpy>=1.15", + "numpy>=1.15,<2", ] dependencies = [ "setuptools>=52", "wheel>=0.36", - "numpy>=1.15", + "numpy>=1.15,<2", ] build-backend = "setuptools.build_meta" From b8dd49bcaec6b3812b6c2ff9c7d03f559aac44b7 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Sat, 29 Jun 2024 20:24:47 +0200 Subject: [PATCH 33/36] Moving LightFM to extras Signed-off-by: miguelgfierro --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 87923cf00..b57ad28f2 100644 --- a/setup.py +++ b/setup.py @@ -30,7 +30,6 @@ "category-encoders>=2.6.0,<3", # requires packaging "cornac>=1.15.2,<2", # requires packaging, tqdm "hyperopt>=0.2.7,<1", - "lightfm>=1.17,<2", # requires requests "lightgbm>=4.0.0,<5", "locust>=2.12.2,<3", # requires jinja2 "memory-profiler>=0.61.0,<1", @@ -80,6 +79,7 @@ # nni needs to be upgraded "nni==1.5", "pymanopt>=0.2.5", + "lightfm>=1.17,<2", ] # The following dependency can be installed as below, however PyPI does not allow direct URLs. From c2e9572f68ec69975fb076112dc3d59497c7baf3 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Sat, 29 Jun 2024 20:36:35 +0200 Subject: [PATCH 34/36] move lightfm tests to experimental Signed-off-by: miguelgfierro --- tests/ci/azureml_tests/test_groups.py | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/tests/ci/azureml_tests/test_groups.py b/tests/ci/azureml_tests/test_groups.py index f05e27a9f..2a262c12d 100644 --- a/tests/ci/azureml_tests/test_groups.py +++ b/tests/ci/azureml_tests/test_groups.py @@ -47,8 +47,6 @@ "tests/functional/examples/test_notebooks_python.py::test_geoimc_functional", # 1006.19s # "tests/functional/examples/test_notebooks_python.py::test_benchmark_movielens_cpu", # 58s - # - "tests/functional/examples/test_notebooks_python.py::test_lightfm_functional", ], "group_cpu_003": [ # Total group time: 2253s "tests/data_validation/recommenders/datasets/test_criteo.py::test_download_criteo_sample", # 1.05s @@ -237,10 +235,6 @@ "tests/unit/recommenders/models/test_geoimc.py::test_imcproblem", "tests/unit/recommenders/models/test_geoimc.py::test_inferer_init", "tests/unit/recommenders/models/test_geoimc.py::test_inferer_infer", - "tests/unit/recommenders/models/test_lightfm_utils.py::test_interactions", - "tests/unit/recommenders/models/test_lightfm_utils.py::test_fitting", - "tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_users", - "tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_items", "tests/unit/recommenders/models/test_sar_singlenode.py::test_init", "tests/unit/recommenders/models/test_sar_singlenode.py::test_fit", "tests/unit/recommenders/models/test_sar_singlenode.py::test_predict", @@ -453,3 +447,14 @@ "tests/unit/examples/test_notebooks_gpu.py::test_gpu_vm", ], } + +# Experimental are additional test groups that require to install extra dependencies: pip install .[experimental] +experimental_test_groups = { + "group_cpu_001": [ + "tests/unit/recommenders/models/test_lightfm_utils.py::test_interactions", + "tests/unit/recommenders/models/test_lightfm_utils.py::test_fitting", + "tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_users", + "tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_items", + "tests/functional/examples/test_notebooks_python.py::test_lightfm_functional", + ] +} From fe1379027046eb47863fe1f298abb09d32261be7 Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Sat, 29 Jun 2024 20:41:25 +0200 Subject: [PATCH 35/36] Note in notebook Signed-off-by: miguelgfierro --- .../02_model_collaborative_filtering/lightfm_deep_dive.ipynb | 2 ++ 1 file changed, 2 insertions(+) diff --git a/examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb b/examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb index 8e588760f..5a60091d7 100755 --- a/examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb +++ b/examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb @@ -22,6 +22,8 @@ "source": [ "This notebook explains the concept of a Factorization Machine based model for recommendation, it also outlines the steps to construct a pure matrix factorization and a Factorization Machine using the [LightFM](https://github.com/lyst/lightfm) package. It also demonstrates how to extract both user and item affinity from a fitted model.\n", "\n", + "*NOTE: LightFM is not available in the core package of Recommenders, to run this notebook, install the experimental package with `pip install recommenders[experimental]`.*\n", + "\n", "## 1. Factorization Machine model\n", "\n", "### 1.1 Background\n", From cf64eed0a2e3eb93ab14e8f20a72d2d04f36c7bb Mon Sep 17 00:00:00 2001 From: miguelgfierro Date: Sat, 29 Jun 2024 22:21:02 +0200 Subject: [PATCH 36/36] Deprecation of SchemaModel in Pandera Signed-off-by: miguelgfierro --- recommenders/datasets/movielens.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recommenders/datasets/movielens.py b/recommenders/datasets/movielens.py index c0b3b5f72..a8a8b4441 100644 --- a/recommenders/datasets/movielens.py +++ b/recommenders/datasets/movielens.py @@ -582,7 +582,7 @@ def unique_columns(df, *, columns): return not df[columns].duplicated().any() -class MockMovielensSchema(pa.SchemaModel): +class MockMovielensSchema(pa.DataFrameModel): """ Mock dataset schema to generate fake data for testing purpose. This schema is configured to mimic the Movielens dataset