Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse data in tidymodels testing #229

Open
EmilHvitfeldt opened this issue Nov 15, 2024 · 0 comments · May be fixed by #234
Open

sparse data in tidymodels testing #229

EmilHvitfeldt opened this issue Nov 15, 2024 · 0 comments · May be fixed by #234

Comments

@EmilHvitfeldt
Copy link
Member

EmilHvitfeldt commented Nov 15, 2024

because we aren’t perfect, each step that produces sparsity has a sparse argument. This argument defaults to "auto" but can be manually set to "yes" or "no" to always or never produce sparse data respectively.
all of this shouldn’t matter whether the tibble contains sparse vectors or not. as we will go off the sparsity. This sparsity is estimates based on the recipe.

ID recipe produce sparsity sparsity model support sparse args
1 yes high yes auto
2 yes high yes no
3 yes high yes yes
4 yes high no auto
5 yes high no no
6 yes high no yes
7 yes low yes auto
8 yes low yes no
9 yes low yes yes
10 yes low no auto
11 yes low no no
12 yes low no yes
13 no high yes auto
14 no high yes no
15 no high yes yes
16 no high no auto
17 no high no no
18 no high no yes
19 no low yes auto
20 no low yes no
21 no low yes yes
22 no low no auto
23 no low no no
24 no low no yes
  • recipe produce sparse means that it contains a recipe step with sparse argument.
  • sparsity means that there is a lot of sparsity in the data.
  • model support the parsnip model supports sparse data, e.i. allow_sparse_x = TRUE.
  • sparse args is what is specified in sparse arguments of steps.

What should happen if control arg is "auto" are listed below.

  • if the model doesn’t support sparsity, then don’t give it sparse data, and stop recipes from creating sparsity, regardless of how sparse the data is
  • if sparsity is high and the model supports it, give it sparse data
  • if sparsity is low and the model supports sparse data, don’t give it sparse data, and make sure that the recipe doesn’t produce sparse data
@EmilHvitfeldt EmilHvitfeldt linked a pull request Jan 17, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant