Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add probability masking to space.sample #1296

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

mariojerez
Copy link

Description

Adds a probability mask feature (probability) to the sample() method of all spaces. This allows you to specify the probability of choosing each action. Similarly to the mask parameter, the probability parameter is a numpy array with the same shape as n, the number of elements in the space. Each value in the array describes the probability of the corresponding value being chosen, with 0 meaning it will not be chosen and 1 meaning that it will be chosen. All of the values in the array must sum to 1. probability is unsupported for the Box and MultiBinary space.

Motivation

This is helpful in instances where values need to be chosen at random, such that some values are given higher priority (a higher likelihood) over others. For example, when implementing an Ant Colony Optimization algorithm, each action needs to be assigned a different probability of being chosen.

Fixes #1255

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@pseudo-rnd-thoughts pseudo-rnd-thoughts changed the title Probability mask Add probability masking to space.sample Jan 23, 2025
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on the PR, overall looks good.
I will look at it in more detail this evening but the biggest thing that I notice is checking if the probabilities don't sum of 1. I'm not sure if numpy will do that for us, or if we'll need to add that check.

FYI, if testing with randomisation is a pain, then fix the seeds to get reliable, consistent results

@mariojerez
Copy link
Author

Yeah I don't think we need to require their probability to sum to 1 since we normalize the probability mask anyways so that it adds to 1. I could remove this requirement if you'd like. This would also allow the user to input a zeros array, and have it behave the same way that a mask of zeros does: return space.start.

Good to know about fixing seeds to help with testing.

Those two failed tests didn't come up for me locally when I ran pytest. I think it might be because I used a different version of numpy, since the string representation of numpy attributes are different on my machine.

@pseudo-rnd-thoughts
Copy link
Member

Yeah, we do testing with NumPy 2.0 and <2.0

I suspect that you might need to ignore the error if numpy.__version__ < "2.0"

@mariojerez
Copy link
Author

Ok. Do you want me to go ahead and make those changes so that the probability sum doesn't have to equal 1?

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive PR @mariojerez

I think this is a question of programming styles, imho, I prefer simplity over minimal lines of code.
Therefore, personally I think that we should implement the feature like

if mask is not None and probability is not None: 
    raise ValueError
elif mask is not None:
    # logical mask sampling logic
elif probability is not None:
    # probabilistic mask sampling logic
else:
    # uniform sampling logic

This should make it easier to debug and understand for new users what is happening. I understand that this will increase the number of lines of code with duplicate error checking. Thoughts?

There seem to be two other technical questions to ask

  1. Should we enforce the sum of probabilities == 1 or do we normalise before applying the probabilities? Personally, I'm in favour of the first as it makes the requirements for users easy to understand. (If we do this, then use numpy.isclose with a small error, as floating point summation).
  2. For composite spaces, i.e., Tuple / Dict, do we allow users to mix the logical and probabilistic masks? From implementation simplity, I would say no. Plus logical is a subset of probabilistic, therefore, users can convert all logical cases to probabilistic to solve this.

Do you agree with my thoughts?

@@ -91,6 +98,10 @@ def sample(self, mask: MaskNDArray | None = None) -> NDArray[np.int8]:
self.np_random.integers(low=0, high=2, size=self.n, dtype=self.dtype),
mask.astype(self.dtype),
)
elif probability is not None:
raise gym.error.Error(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this unsupported currently? I'm happy to add this if you wish.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't obvious to me how to do it at the time, and I decided to move on. It honestly would be a huge help if you did! I'm pretty overwhelmed with classes and other commitments I have this semester.

@mariojerez
Copy link
Author

Thanks @pseudo-rnd-thoughts !

I agree that that template you provided is simpler and easier to understand. In discrete.py for example, here's what I'm thinking it could be changed to:

if mask is not None and probability is not None: 
    raise ValueError
elif mask is not None:
    self._validate_mask(
                mask,
                (self.n,),
                np.int8,
                "mask",
            )
    valid_action_mask = self._get_valid_action_mask(mask, "mask")
    # continue sampling logic for mask
elif probability is not None:
    self._validate_mask(
                probability,
                (self.n,),
                np.float64,
                "probability",
            )
    valid_action_mask = self._get_valid_action_mask(probability, "probability")
    # continue sampling logic for probability
else:
    # uniform sampling logic
  1. I agree that enforcing probabilities == 1 makes it straight forward to understand. Anyways, the user isn't missing out too much from us enforcing it, because they can easily normalize it themselves.

Referring to your comment about doing the isclose check, In _get_valid_action_mask() (in discrete.py), I have this check:

assert np.isclose(
                np.sum(mask), 1
            ), f"The sum of all values of `probability mask` should be 1, actual sum: {np.sum(mask)}"

I think you're telling me to use np.fsum instead of np.sum, which makes sense. The relative tolerance and absolute tolerance of np.isclose is rtol=1e-05, atol=1e-08 respectively by default. I don't have a problem with these default values, but do you think that I should hard-code them?

  1. I agree with you, I think we can restrict them to one type of mask for composite spaces, without serious disadvantages. One thing to consider though is that according to Numpy's documentation, when adding probabilities (through the p parameter) to the numpy.random.Generator.choice method, it samples less efficiently. So I could see it benefiting a user to be able to sample more efficiently by using mask instead of probability whenever they can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Probability Mask for sampling a space
2 participants