Evaluating on the validation dataset #14

yinoue0426 · 2022-06-17T07:34:03Z

I am currently trying to run the evaluation code for valid-unseen dataset, with the available pretrained models, and I have some questions.

First, this is the code I am using to run the evaluation:

python main.py -n1 --max_episode_length 1000 --num_local_steps 25 --num_processes 1 --eval_split valid_unseen --from_idx 0 --to_idx 820 --max_fails 10 --debug_local --learned_depth --use_sem_seg --set_dn tmp --use_sem_policy -v 0 --which_gpu 0 --x_display 0

The code fails to run, however, complaining that rewards.json is missing. So I pulled it from the [ALFRED repo](https://github.com/askforalfred/alfred/blob/master/models/config/rewards.json), and added --reward_config flag to arguments.py.

The code runs with the modification, and I get SR=18.03%, which matches the val-unseen score for "without template assumption" (Table 2 of the FILM paper). I am using the best_model_multi.pt as the semantic search policy.
What I wasn't quite sure was what language processing modules is being used. I think the predicted templates are read from the models/instructions_processed_LP/instruction2_params*.p files, and I was wondering whether they are generated with/without the template assumption.

Thanks,

The text was updated successfully, but these errors were encountered:

yinoue0426 · 2022-06-23T08:33:35Z

I looked at my results in more details and have one update to the original post.

The code runs with the modification, and I get SR=18.03%

The score I report here is actually a result of splitting the evaluation among 3 PCs (--from_idx and --to_idx were split among 3 PCs), so I doubt the random number generation is the same as the one used in the paper. So please ignore about 18.03% matching the paper result.

I am still interested in knowing what language processing modules is being used- i.e. whether models/instructions_processed_LP/instruction2_params*.p files are generated with/without the template assumption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating on the validation dataset #14

Evaluating on the validation dataset #14

yinoue0426 commented Jun 17, 2022

yinoue0426 commented Jun 23, 2022

Evaluating on the validation dataset #14

Evaluating on the validation dataset #14

Comments

yinoue0426 commented Jun 17, 2022

yinoue0426 commented Jun 23, 2022