-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add probability masking to space.sample
#1296
base: main
Are you sure you want to change the base?
Conversation
…rove efficiency and readability
space.sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on the PR, overall looks good.
I will look at it in more detail this evening but the biggest thing that I notice is checking if the probabilities don't sum of 1. I'm not sure if numpy will do that for us, or if we'll need to add that check.
FYI, if testing with randomisation is a pain, then fix the seeds to get reliable, consistent results
Yeah I don't think we need to require their probability to sum to 1 since we normalize the probability mask anyways so that it adds to 1. I could remove this requirement if you'd like. This would also allow the user to input a zeros array, and have it behave the same way that a Good to know about fixing seeds to help with testing. Those two failed tests didn't come up for me locally when I ran pytest. I think it might be because I used a different version of numpy, since the string representation of numpy attributes are different on my machine. |
Yeah, we do testing with NumPy 2.0 and <2.0 I suspect that you might need to ignore the error if |
Ok. Do you want me to go ahead and make those changes so that the probability sum doesn't have to equal 1? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive PR @mariojerez
I think this is a question of programming styles, imho, I prefer simplity over minimal lines of code.
Therefore, personally I think that we should implement the feature like
if mask is not None and probability is not None:
raise ValueError
elif mask is not None:
# logical mask sampling logic
elif probability is not None:
# probabilistic mask sampling logic
else:
# uniform sampling logic
This should make it easier to debug and understand for new users what is happening. I understand that this will increase the number of lines of code with duplicate error checking. Thoughts?
There seem to be two other technical questions to ask
- Should we enforce the sum of probabilities == 1 or do we normalise before applying the probabilities? Personally, I'm in favour of the first as it makes the requirements for users easy to understand. (If we do this, then use
numpy.isclose
with a small error, as floating point summation). - For composite spaces, i.e., Tuple / Dict, do we allow users to mix the logical and probabilistic masks? From implementation simplity, I would say no. Plus logical is a subset of probabilistic, therefore, users can convert all logical cases to probabilistic to solve this.
Do you agree with my thoughts?
@@ -91,6 +98,10 @@ def sample(self, mask: MaskNDArray | None = None) -> NDArray[np.int8]: | |||
self.np_random.integers(low=0, high=2, size=self.n, dtype=self.dtype), | |||
mask.astype(self.dtype), | |||
) | |||
elif probability is not None: | |||
raise gym.error.Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this unsupported currently? I'm happy to add this if you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't obvious to me how to do it at the time, and I decided to move on. It honestly would be a huge help if you did! I'm pretty overwhelmed with classes and other commitments I have this semester.
Thanks @pseudo-rnd-thoughts ! I agree that that template you provided is simpler and easier to understand. In discrete.py for example, here's what I'm thinking it could be changed to:
Referring to your comment about doing the isclose check, In
I think you're telling me to use
|
Description
Adds a probability mask feature (
probability
) to thesample()
method of all spaces. This allows you to specify the probability of choosing each action. Similarly to themask
parameter, theprobability
parameter is a numpy array with the same shape asn
, the number of elements in the space. Each value in the array describes the probability of the corresponding value being chosen, with 0 meaning it will not be chosen and 1 meaning that it will be chosen. All of the values in the array must sum to 1.probability
is unsupported for the Box and MultiBinary space.Motivation
This is helpful in instances where values need to be chosen at random, such that some values are given higher priority (a higher likelihood) over others. For example, when implementing an Ant Colony Optimization algorithm, each action needs to be assigned a different probability of being chosen.
Fixes #1255
Type of change
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)