[Bug Report] Collision detection failure in Ant-UMaze #259

abagaria · 2024-12-01T21:40:45Z

Description of the bug
While using AntMaze_UMaze-v5 alongside a pseudocount exploration algorithm, I noticed that the ant can go through the walls in the maze. Initially, this is a rare occurrence, but since I am training an novelty-based exploration algorithm, the agent is able to recreate the issue with greater reliability over time.

Code example
Here is how I am creating the env:

env = gym.make(
    'AntMaze_UMaze-v5',
    render_mode='rgb_array',
    max_episode_steps=1000,
    continuing_task=False
)

Not that this should be important, but for context, I am using the TD3 algorithm and CFN for novelty-based intrinsic rewards.

Versioning
gymnasium_robotics: 1.3.1
gymnasium: 1.0.0
python: 3.9.20

Supporting Evidence

In the attached image, I have plotted the (x, y) coordinates of the ant (according to the states saved in the replay buffer). The color of each point denotes the novelty prediction, but that can be ignored for our purposes. The purple lines show where the walls should be (approximately), and the red circle highlights a trajectory that goes through the wall near the start state and exits near the goal state (which is at (-4, 4)).

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

Kallinteris-Andreas · 2024-12-02T13:12:29Z

can you provide a video (it might be climbing on top of the wall I am not sure) (rootz)
which mujoco version are you using

abagaria · 2024-12-03T05:05:20Z

Yup, it looks like we have a flying ant, you gotta love RL :)

The first attachment is a video showing an example of the ant jumping on top of the maze. The second attachment is a scatter plot which shows that when the ant is in the same (x, y) location as the walls, it indeed has a much higher z-coordinate than usual, which supports your hypothesis that the ant is jumping on top of the maze.

mujoco version: 3.1.6

Edit: It looks like github isn't able to play my video, so here is a youtube link: https://youtube.com/shorts/pqJv8c8wTuU?feature=share

flying_ants.mp4

abagaria · 2024-12-03T05:06:27Z

accidentally closed the issue, we still need to figure out how to prevent the ant from jumping on top of the obstacles.

Kallinteris-Andreas · 2024-12-03T11:24:53Z

rootz should not be above 1 meter the environment should terminate, are you sure you are checking for terminated=True

abagaria · 2024-12-03T14:59:37Z

Yes, I am pretty sure i am checking for terminated=True. Here is a simple wrapper I am using around the gymnasium_robotics antmaze (it fixes the start, goal states, turns off position noise, and returns observation vectors instead of dictionaries):

import numpy as np
import gymnasium_robotics
import gymnasium as gym

from typing import Tuple, Dict


class AntMazeInfoWrapper(gym.Wrapper):
  def __init__(self, env, start_pos=None, goal_pos=None, noise_level=0.0):
    super(AntMazeInfoWrapper, self).__init__(env)
    self.start_pos = start_pos
    self.goal_pos = goal_pos
    self.noise_level = noise_level

    self.unwrapped.position_noise_range = 0.

    if self.goal_pos is not None:
      self.unwrapped.maze._unique_goal_locations = [self.goal_pos]

    if self.start_pos is not None:
      self.unwrapped.maze._unique_reset_locations = [self.start_pos]

  def reset(self) -> Tuple[np.ndarray, Dict]:
    obs, info = self.env.reset()
    
    return self._get_obs(obs), self._modify_info(info, obs)

  def step(self, action) -> Tuple[np.ndarray, float, bool, Dict]:
    obs, reward, terminated, truncated, info = self.env.step(action)

    done = terminated or truncated

    if terminated:
      print(f'Terminated at {obs["achieved_goal"]} with reward {reward}')

    return self._get_obs(obs), reward, done, self._modify_info(info, obs)
  
  def _get_obs(self, obs: Dict):
    pos = obs['achieved_goal']
    return np.concatenate([pos, obs['observation']])
  
  def _modify_info(self, info: Dict, obs: Dict):
    info['start_pos'] = self.start_pos
    info['goal_pos'] = self.goal_pos
    info['pos'] = obs['achieved_goal']
    info['goal'] = obs['desired_goal']
    return info
  

def environment_builder(
  level_name: str = 'AntMaze_UMaze-v5',
  max_episode_steps: int = 800,
  randomize_start_pos: bool = False,
  randomize_goal_pos: bool = False,
  noise_level: float = 0.0
):
  env = gym.make(
    level_name,
    render_mode='rgb_array',
    max_episode_steps=max_episode_steps,
    continuing_task=False)
  
  env = AntMazeInfoWrapper(
    env,
    start_pos=np.array([-4., -4.]) if not randomize_start_pos else None,
    goal_pos=np.array([-4., 4.]) if not randomize_goal_pos else None,
    noise_level=noise_level,
  )

  return env

And here is the control loop I used to generate the video I posted in my previous post; as you can see, I am checking for the done flag which is set in the wrapper as done=terminated or truncated:

def rollout(agent, env):
    """Rollout agent in environment."""
    state, _ = env.reset()
    done = False
    total_reward = 0
    state_traj = [state]
    image_traj = [env.render()]

    while not done:
        action = agent.act(state)
        next_state, reward, done, _ = env.step(action)
        total_reward += reward
        state = next_state
        state_traj.append(state)
        image_traj.append(env.render())

    return total_reward, state_traj, image_traj

abagaria · 2024-12-05T01:31:47Z

Could it be because the AntMaze class is ignoring the terminated flag from the inner mujoco ant env?

Gymnasium-Robotics/gymnasium_robotics/envs/maze/ant_maze_v5.py

Line 294 in 3719d9d

ant_obs, _, _, _, info = self.ant_env.step(action)

Kallinteris-Andreas · 2024-12-08T23:06:15Z

It is indeed doing that

Gymnasium-Robotics/gymnasium_robotics/envs/maze/ant_maze_v5.py

Lines 293 to 294 in 3719d9d

    
           def step(self, action): 
        
               ant_obs, _, _, _, info = self.ant_env.step(action)

As to why, I don't know it was never an issue before as far as I can tell

The simplest solution would be to create new files maze_v6 and ant_maze_v6 with the self.computed_terminated function changed to take into consideration info["reward_survive"]

Kallinteris-Andreas · 2024-12-13T07:42:52Z

@abagaria want to try implementing it

abagaria · 2024-12-13T16:08:32Z

@Kallinteris-Andreas happy to!

abagaria closed this as completed Dec 3, 2024

abagaria reopened this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Collision detection failure in Ant-UMaze #259

[Bug Report] Collision detection failure in Ant-UMaze #259

abagaria commented Dec 1, 2024

Kallinteris-Andreas commented Dec 2, 2024 •

edited

Loading

abagaria commented Dec 3, 2024 •

edited

Loading

abagaria commented Dec 3, 2024

Kallinteris-Andreas commented Dec 3, 2024

abagaria commented Dec 3, 2024 •

edited by Kallinteris-Andreas

Loading

abagaria commented Dec 5, 2024 •

edited

Loading

Kallinteris-Andreas commented Dec 8, 2024 •

edited

Loading

Kallinteris-Andreas commented Dec 13, 2024

abagaria commented Dec 13, 2024

[Bug Report] Collision detection failure in Ant-UMaze #259

[Bug Report] Collision detection failure in Ant-UMaze #259

Comments

abagaria commented Dec 1, 2024

Checklist

Kallinteris-Andreas commented Dec 2, 2024 • edited Loading

abagaria commented Dec 3, 2024 • edited Loading

abagaria commented Dec 3, 2024

Kallinteris-Andreas commented Dec 3, 2024

abagaria commented Dec 3, 2024 • edited by Kallinteris-Andreas Loading

abagaria commented Dec 5, 2024 • edited Loading

Kallinteris-Andreas commented Dec 8, 2024 • edited Loading

Kallinteris-Andreas commented Dec 13, 2024

abagaria commented Dec 13, 2024

Kallinteris-Andreas commented Dec 2, 2024 •

edited

Loading

abagaria commented Dec 3, 2024 •

edited

Loading

abagaria commented Dec 3, 2024 •

edited by Kallinteris-Andreas

Loading

abagaria commented Dec 5, 2024 •

edited

Loading

Kallinteris-Andreas commented Dec 8, 2024 •

edited

Loading