Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Collision detection failure in Ant-UMaze #259

Open
1 task done
abagaria opened this issue Dec 1, 2024 · 9 comments
Open
1 task done

[Bug Report] Collision detection failure in Ant-UMaze #259

abagaria opened this issue Dec 1, 2024 · 9 comments

Comments

@abagaria
Copy link

abagaria commented Dec 1, 2024

Description of the bug
While using AntMaze_UMaze-v5 alongside a pseudocount exploration algorithm, I noticed that the ant can go through the walls in the maze. Initially, this is a rare occurrence, but since I am training an novelty-based exploration algorithm, the agent is able to recreate the issue with greater reliability over time.

Code example
Here is how I am creating the env:

env = gym.make(
    'AntMaze_UMaze-v5',
    render_mode='rgb_array',
    max_episode_steps=1000,
    continuing_task=False
)

Not that this should be important, but for context, I am using the TD3 algorithm and CFN for novelty-based intrinsic rewards.

Versioning
gymnasium_robotics: 1.3.1
gymnasium: 1.0.0
python: 3.9.20

Supporting Evidence

In the attached image, I have plotted the (x, y) coordinates of the ant (according to the states saved in the replay buffer). The color of each point denotes the novelty prediction, but that can be ignored for our purposes. The purple lines show where the walls should be (approximately), and the red circle highlights a trajectory that goes through the wall near the start state and exits near the goal state (which is at (-4, 4)).

Checklist

  • I have checked that there is no similar issue in the repo (required)
    cfn_novelty_predictions
@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Dec 2, 2024

  1. can you provide a video (it might be climbing on top of the wall I am not sure) (rootz)
  2. which mujoco version are you using

@abagaria
Copy link
Author

abagaria commented Dec 3, 2024

Yup, it looks like we have a flying ant, you gotta love RL :)

The first attachment is a video showing an example of the ant jumping on top of the maze. The second attachment is a scatter plot which shows that when the ant is in the same (x, y) location as the walls, it indeed has a much higher z-coordinate than usual, which supports your hypothesis that the ant is jumping on top of the maze.

mujoco version: 3.1.6

Edit: It looks like github isn't able to play my video, so here is a youtube link: https://youtube.com/shorts/pqJv8c8wTuU?feature=share

flying_ants.mp4

ant_xyz

@abagaria abagaria closed this as completed Dec 3, 2024
@abagaria abagaria reopened this Dec 3, 2024
@abagaria
Copy link
Author

abagaria commented Dec 3, 2024

accidentally closed the issue, we still need to figure out how to prevent the ant from jumping on top of the obstacles.

@Kallinteris-Andreas
Copy link
Collaborator

rootz should not be above 1 meter the environment should terminate, are you sure you are checking for terminated=True

@abagaria
Copy link
Author

abagaria commented Dec 3, 2024

Yes, I am pretty sure i am checking for terminated=True. Here is a simple wrapper I am using around the gymnasium_robotics antmaze (it fixes the start, goal states, turns off position noise, and returns observation vectors instead of dictionaries):

import numpy as np
import gymnasium_robotics
import gymnasium as gym

from typing import Tuple, Dict


class AntMazeInfoWrapper(gym.Wrapper):
  def __init__(self, env, start_pos=None, goal_pos=None, noise_level=0.0):
    super(AntMazeInfoWrapper, self).__init__(env)
    self.start_pos = start_pos
    self.goal_pos = goal_pos
    self.noise_level = noise_level

    self.unwrapped.position_noise_range = 0.

    if self.goal_pos is not None:
      self.unwrapped.maze._unique_goal_locations = [self.goal_pos]

    if self.start_pos is not None:
      self.unwrapped.maze._unique_reset_locations = [self.start_pos]

  def reset(self) -> Tuple[np.ndarray, Dict]:
    obs, info = self.env.reset()
    
    return self._get_obs(obs), self._modify_info(info, obs)

  def step(self, action) -> Tuple[np.ndarray, float, bool, Dict]:
    obs, reward, terminated, truncated, info = self.env.step(action)

    done = terminated or truncated

    if terminated:
      print(f'Terminated at {obs["achieved_goal"]} with reward {reward}')

    return self._get_obs(obs), reward, done, self._modify_info(info, obs)
  
  def _get_obs(self, obs: Dict):
    pos = obs['achieved_goal']
    return np.concatenate([pos, obs['observation']])
  
  def _modify_info(self, info: Dict, obs: Dict):
    info['start_pos'] = self.start_pos
    info['goal_pos'] = self.goal_pos
    info['pos'] = obs['achieved_goal']
    info['goal'] = obs['desired_goal']
    return info
  

def environment_builder(
  level_name: str = 'AntMaze_UMaze-v5',
  max_episode_steps: int = 800,
  randomize_start_pos: bool = False,
  randomize_goal_pos: bool = False,
  noise_level: float = 0.0
):
  env = gym.make(
    level_name,
    render_mode='rgb_array',
    max_episode_steps=max_episode_steps,
    continuing_task=False)
  
  env = AntMazeInfoWrapper(
    env,
    start_pos=np.array([-4., -4.]) if not randomize_start_pos else None,
    goal_pos=np.array([-4., 4.]) if not randomize_goal_pos else None,
    noise_level=noise_level,
  )

  return env

And here is the control loop I used to generate the video I posted in my previous post; as you can see, I am checking for the done flag which is set in the wrapper as done=terminated or truncated:

def rollout(agent, env):
    """Rollout agent in environment."""
    state, _ = env.reset()
    done = False
    total_reward = 0
    state_traj = [state]
    image_traj = [env.render()]

    while not done:
        action = agent.act(state)
        next_state, reward, done, _ = env.step(action)
        total_reward += reward
        state = next_state
        state_traj.append(state)
        image_traj.append(env.render())

    return total_reward, state_traj, image_traj

@abagaria
Copy link
Author

abagaria commented Dec 5, 2024

Could it be because the AntMaze class is ignoring the terminated flag from the inner mujoco ant env?

ant_obs, _, _, _, info = self.ant_env.step(action)

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Dec 8, 2024

It is indeed doing that

def step(self, action):
ant_obs, _, _, _, info = self.ant_env.step(action)

As to why, I don't know it was never an issue before as far as I can tell

The simplest solution would be to create new files maze_v6 and ant_maze_v6 with the self.computed_terminated function changed to take into consideration info["reward_survive"]

@Kallinteris-Andreas
Copy link
Collaborator

@abagaria want to try implementing it

@abagaria
Copy link
Author

@Kallinteris-Andreas happy to!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants