Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seed same in different objects #120

Open
petrovicboban opened this issue May 18, 2023 · 3 comments
Open

seed same in different objects #120

petrovicboban opened this issue May 18, 2023 · 3 comments
Assignees
Labels
Python Python changes

Comments

@petrovicboban
Copy link
Contributor

Let's have

form random_forestry import RandomForest

rf_1 = RandomForest()
rf_2 = RandomForest()

Since default value for seed is random number, we expect:

assert rf_1.seed != rf_2.seed

but that's not what's happening.

although rf_1 and rf_2 are really different objects:

assert id(rf_1) != id(rf_2)

their attributes which have default value set are not:

assert id(rf_1.seed) == id(rf_2.seed)

Python does copy on write here, so if value is changed later, it will create new seed object then:

rf_1.seed = 5
assert id(rf_1.seed) != id(rf_2.seed)

The implication is that if you create different RandomForest objects with seed not explicitly set, seed will be random but all objects will have the same value.
Is this acceptable to you? @edwardwliu @theo-s

@petrovicboban petrovicboban added the Python Python changes label May 18, 2023
@petrovicboban petrovicboban self-assigned this May 18, 2023
@edwardwliu
Copy link
Collaborator

edwardwliu commented May 22, 2023

It sounds like two RandomForest objects will have different seeds when instantiated, but the objects are considered equal since the default value of seed=randrange() is the same. Is this still an issue if we the class explicitly assigns the seed in an initialization or post initialization step (even when the user doesn't provide one)?

@petrovicboban
Copy link
Contributor Author

Two RandomForest objects will have the same seeds when instantiated. Although, it you re-run script, seed value will be different, but again, the same for both objects.
I don't think we can do it in __init__ and still pass all tests in test_sklearn_compatible_estimator.
The only way to do it is to do it at the start of fit like we do for some other parameters: leave the parameter in __init__ but set default value to None, and then, in fit, check if it's None, and set random value if it is.

@petrovicboban
Copy link
Contributor Author

Actually, we can't do it. We'll have to move seed to fit parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Python changes
Projects
None yet
Development

No branches or pull requests

2 participants