Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at Resetting ThorEnv #18

Open
dada-h-h opened this issue Aug 8, 2022 · 6 comments
Open

Stuck at Resetting ThorEnv #18

dada-h-h opened this issue Aug 8, 2022 · 6 comments

Comments

@dada-h-h
Copy link

dada-h-h commented Aug 8, 2022

HI! I'm dealing with similar issues as @biubiuisacat and @JinyeonKim.
I'm stuck at Resetting ThorEnv and I double checked the dependency (pytorch==1.6.0, torchvision==0.7.0, cudatoolkit=10.2) so I don't think that's the reason why the code is not working...

Also I ran the code with a desktop with 2080 ti so hardware probably wouln't cause the problem either.

So I looked up the ai2thor code and I found the code stops working when ~/FILM/alfred_utils/env/thor_env_code.py calls the function super().step() (line 278). The function looks like below.

(ai2thor/controller.py, line 615)

def step(self, action, raise_for_failure=False):
        if self.headless:
            action["renderImage"] = False
        # prevent changes to the action from leaking
        action = copy.deepcopy(action)
        # XXX should be able to get rid of this with some sort of deprecation warning
        if 'AI2THOR_VISIBILITY_DISTANCE' in os.environ:
            action['visibilityDistance'] = float(os.environ['AI2THOR_VISIBILITY_DISTANCE'])

        should_fail = False
        self.last_action = action

        if ('objectId' in action and (action['action'] == 'OpenObject' or action['action'] == 'CloseObject')):

            force_visible = action.get('forceVisible', False)
            if not force_visible and self.last_event.instance_detections2D and action['objectId'] not in self.last_event.instance_detections2D:
                should_fail = True

            obj_metadata = self.last_event.get_object(action['objectId'])
            if obj_metadata is None or obj_metadata['isOpen'] == (action['action'] == 'OpenObject'):
                should_fail = True

        rotation = action.get('rotation')
        if rotation is not None and type(rotation) != dict:
            action['rotation'] = {}
            action['rotation']['y'] = rotation

        if should_fail:
            new_event = copy.deepcopy(self.last_event)
            new_event.metadata['lastActionSuccess'] = False
            self.last_event = new_event
            return new_event

        assert self.request_queue.empty(), 'request_queue is not empty' # continues if request_queue is empty.

        self.response_queue.put_nowait(action) #put action. nonblocking queue

        # code stops at this point.
        self.last_event = queue_get(self.request_queue)

        if not self.last_event.metadata['lastActionSuccess'] and self.last_event.metadata['errorCode'] == 'InvalidAction':
            raise ValueError(self.last_event.metadata['errorMessage'])

        if raise_for_failure:
            assert self.last_event.metadata['lastActionSuccess']

        return self.last_event

Then I found out the code stops when the function queue_get(self.request_queue) is called (I marked where it is with annotation). The function has a while loop in it and the program has to break out of the while loop if it gets an item from the request_queue, but it keeps fails to get an item from the queue because the queue is empty, so the code is just stuck at the while loop.

def queue_get(que:Queue):
    res = None

    while True:
        try:
            res = que.get(block=True, timeout=0.5)
            print("que.get result: ", res)       
            break

        except Empty:
            pass

    return res

Could I get some advice of why this happens and how to solve this problem? I'm stuck here for weeks...😭😭

Thanks!

@Roadsong
Copy link

Roadsong commented Aug 8, 2022

@dada-h-h Exactly same here. Could you try a minimal examples https://allenai.github.io/ai2thor-v2.1.0-documentation/examples ?

You can also try to set

controller = ai2thor.controller.Controller(headless=True)

to see if there is any difference.

@soyeonm
Copy link
Owner

soyeonm commented Aug 8, 2022

Hello, I think if you can't run the reset here, it's likely that you can't run the one in ALFRED either:

https://github.com/askforalfred/alfred/blob/master/env/thor_env.py#L47

If it's a headless computer, it's likely to be a Xserver problem. (The simulator not recognizing Xserver). You should check if ALFRED's scripts/check_thor.py works (https://github.com/askforalfred/alfred/blob/master/scripts/check_thor.py)

@Roadsong
Copy link

Roadsong commented Aug 8, 2022

Hello, I think if you can't run the reset here, it's likely that you can't run the one in ALFRED either:

https://github.com/askforalfred/alfred/blob/master/env/thor_env.py#L47

If it's a headless computer, it's likely to be a Xserver problem. (The simulator not recognizing Xserver). You should check if ALFRED's scripts/check_thor.py works (https://github.com/askforalfred/alfred/blob/master/scripts/check_thor.py)

Hi @soyeonm, does the code is expected to work on a MacOS machine? I noticed that you also included some macos instructions in readme, but I faced the similar hanging issues here. I cannot even run a minimal example of ai2thor, version 2.1.0.

I probably should raise the issue in alfred repo, by the way.

@soyeonm
Copy link
Owner

soyeonm commented Aug 8, 2022

Hello, thanks for your question. Yes, it ran on my mac; I will check again later today.

@dada-h-h
Copy link
Author

dada-h-h commented Aug 9, 2022

@soyeonm @Roadsong Thank you very much for your answers!

It seems like it was a dependency issue. I tried making a new conda environment(python 3.8.5) and installed all the packages referring to the package versions in the docker container, and then it worked!

The specific versions are:

numpy==1.20.2
pandas==1.2.4 
opencv-python==4.5.1.48 
networkx==2.5.1
h5py==3.2.1
tqdm==4.64.0
vocab==0.0.5
revtok==0.0.3
Pillow==9.0.2
torch==1.6.0
torchvision==0.7.0
tensorboardX==1.8
ai2thor==2.1.0
matplotlib==3.5.1
tensorboard==2.9.1
seaborn==0.9.0
imageio==2.6.0
scikit-fmm==2019.1.30
scikit-image==0.15.0
scikit-learn==0.22.2.post1
ifcfg==0.21

I'm still not sure what exact packages are causing the issue though...

Plus, when I was installing the packages, I used this file which I pip freeze from the docker container.
film_docker_requirements.txt

This is what I did:
I first installed pytorch,

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

then ran conda install to download requirements, (it takes some time)

while read requirement; do conda install --yes $requirement; done < film_docker_requirements.txt

then used pip or conda-forge to install missing packages.
also I checked whether check_thor.py works everytime I installed any new package.

@VoHoangAnh
Copy link

Hello
In my case, I solved this issue with Pytorch 2.1 by reinstalling Werkzeug and Flask.
pip install Werkzeug==2.03 Flask==2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants