Name	Name	Last commit message	Last commit date
Latest commit History 210 Commits
collect_real	collect_real
collect_sim	collect_sim
dataset	dataset
media	media
metrics	metrics
models	models
.gitignore	.gitignore
.gitignore.save	.gitignore.save
DataCard.md	DataCard.md
LICENSE	LICENSE
README.md	README.md
cluster.py	cluster.py
compute_metric.py	compute_metric.py
plotting.ipynb	plotting.ipynb
rt1_dataloader.py	rt1_dataloader.py
run_rt1.sh	run_rt1.sh
run_rt1_div.sh	run_rt1_div.sh
run_rt1_lhscene.sh	run_rt1_lhscene.sh
run_rt1_pretrain.sh	run_rt1_pretrain.sh
run_s2s.sh	run_s2s.sh
scatterplot_data.csv	scatterplot_data.csv

LaNPM Dataset Benchmark

As robots that follow natural language become more capable and prevalent, we need a benchmark to holistically develop and evaluate their ability to solve long-horizon mobile manipulation tasks in large, diverse environments. Robots must use visual and language understanding, navigation, and manipulation capabilities to tackle this challenge. Existing datasets do not integrate all these aspects, restricting their efficacy as benchmarks. To address this gap, we present the Language, Navigation, Manipulation, Perception (LaNMP) dataset and demonstrate the benefits of integrating these four capabilities and various modalities. LaNMP comprises 574 trajectories across eight simulated and real-world environments for long-horizon room-to-room pick-and-place tasks specified by natural language. Every trajectory consists of over 20 attributes, including RGB-D images, segmentations, and the poses of the robot body, end-effector, and grasped objects. We fine-tuned and tested two models in simulation and on a physical robot to demonstrate its efficacy in development and evaluation. The models perform suboptimally compared to humans across various metrics, indicating significant room for developing better multimodal mobile manipulation models using our benchmark.

Dataset Format

More detailed dataset information can be found in the dataset card DataCard.md.

Download the dataset from this DropBox.

Code that opens, reads, and displays the dataset contents can be found in this Google Colab notebook.

Sim Dataset

The simulation dataset comes in a single hdf5 file, and has the following hierarchy:

sim_dataset.hdf5/
├── data_11:11:28/
│   ├── folder_0
│   ├── folder_1
│   └── folder_2
├── data_11:14:08/
│   ├── folder_0
│   └── ...
└── ...

Under each folder, there are three main numpy files: depth_<num>, inst_seg_<num>, and rgb_<num>, which correspond to the depth image, segmentation image, and rgb image, respectively.

Under the metadata for each folder, there is a dumped json describing other metadata of each time step. The detailed metadata can be found in the dataset card.

Real Dataset

Similarly, the real dataset also comes in a single hdf5 file, and has the following hierarchy:

real_dataset.hdf5/
└── FloorTrajectories/
    ├── data_00/
    │   ├── folder_10/
    │   │   ├── gripper_depth_10
    │   │   ├── gripper_image_10
    │   │   ├── left_fisheye_depth_10
    │   │   ├── left_fisheye_image_10
    │   │   ├── right_fisheye_depth_10
    │   │   ├── right_fisheye_image_10
    │   │   └── metadata
    │   └── folder_11/
    │       ├── gripper_depth_10
    │       ├── gripper_image_10
    │       └── ...
    ├── data_01/
    │   └── folder_10/
    │       └── ...
    └── ...

Note that the right fisheye is located on the right side of the robot, but points towards the left side. So the right fisheye produces the left half of the image, and the left one produces the right half.

The images have the following sizes:

key	shape
gripper_depth_10	(480, 640)
gripper_image_10	(480, 640, 3)
left_fisheye_depth_10	(240, 424)
left_fisheye_image_10	(640, 480, 3)
right_fisheye_depth_10	(240, 424)
right_fisheye_image_10	(640, 480, 3)

The detailed metadata can be found in the dataset card.

Running Data Collection

Simulation (AI2THOR)

cd collect_sim
python install -r sim_reqs.txt
cd custom_ai2thor_lib_code
Move the files to the ai2thor library folder in the virtual environment
Collect data python mani.py --scene "<scene number>" --command "<natural language command>". Use the following keys to move in the simulator:

WASD: moving the robot base
J/L: rotate the robot left/right
I/K: moving the robot head up/down
G: grasp
R: release
Up arrow/down arrow: move robot shoulder up/down
7/4: move end-effector left/right
8/5 move end-effector up/down
9/6 move end-effector forward/backward
Q: end collection and save data
CTRL+C: restart collection without saving

Real (Spot)

cd collect_real
conda create --name <env> --file spot_env.txt
Create a map using python record_env_graph.py. See this for more details on how to record the map.
Collect data using the map python collect_spot_data.py -u <map folder> -t "<natural language command>"

RT-1

The RT-1 model from the paper "RT-1: Robotics Transformer for Real-World Control at Scale" by Brohan et al. was modified and fine-tuned on LaNMP. This model was trained and run on an NVIDIA 3090 GPU.

A forked implementation of RT1 (Robotic Transformer) originally inspired by the Google Research paper.

This implemenetation of RT-1 was pretrained on the Bridge dataset and further fine-tuned on our LaNMP dataset for evaluation. Please find details of the repository below

Setup Instructions

git clone [email protected]:h2r/LaNPM-Dataset.git
cd models/main_models/rt1
pip install -e .

Overview of files

This repository has 7 critical files/folders whose use cases are described below

main.py: used to pretrain RT-1 on the bridge dataset. Modifying this file to accomodate different datasets requires changing the observation_space and action_space according to the dataset being loaded, as well as changing the dataset keys in rt1_pytorch/tokenizers/action_tokenizer.py. Running this file saves a series of checkpoints and logs losses using weights and biases
main_ft.py: used to finetune RT-1 on the LaNMP dataset. This file has the observation_space and action_space and PyTorch DataLoader already modified to accomodate for the LaNMP dataset finetuning (AI2Thor). Running this file saves a series of checkpoints and logs losses using weights and biases
main_ft_eval.py: used to run RT-1 in inference mode on the LaNMP dataset. This file has the observation_space and action_space and PyTorch DataLoader already modified to accomodate for the LaNMP dataset (AI2Thor). The file iterates/loads all saved checkpoints from finetuning and runs RT-1 on inference mode for the validation dataset on each checkpoint. The script logs the test losses using weights and biases
ai2thor_env.py: contains a Gym environment style class to load and take steps in AI2Thor enivironment. This file is used to generate real-time trajectories based on the action tokens generated by a finetuned RT-1 model (specific for AI2Thor). The main step() function takes/executes the generated action by RT-1 and returns a success message along with information about the environment state e.g. object or agent metadata, which can be saved to capture the trajectory taken by the agent for a given task
rollout_ai2thor.py: interfaces between the finetuned RT-1 model (from a loaded checkpoint after finetuning on LaNMP) and the ai2thor_env.py Gym environment, in order to send observations from the AI2Thor environment to RT-1 and execute proposed action tokens by RT-1 on AI2Thor. Note that this file should not be run on a headless machine since it requires/deploys AI2Thor simulator GUI
rt1_pytorch/rt1_policy.py: contains the RT-1 model implementation in PyTorch. The loss() function performs forward pass of RT-1 for training and act() function performs the forward pass during inference.
lanmp_dataloader/rt1_dataloader.py: contains the DatasetManager class that extracts trajectories from the LaNMP sim_data.hdf5 dataset file. The script automatically separates train and validation subsets according to different splits e.g. k-fold by scene, task wise or for diversity ablation. The DatasetManager also handles tokenizing/detokenizing the raw trajectory data into 256 discrete buckets, whilst also chunking trajectories across non-overlapping window lengths of 6 steps

Details about file arguments

Most relevant files in this repository accept the same set of arguments that are detailed below

dataset: only for the main.py file, specifies the dataset on which the RT-1 model should be pretrained
train-split: specifies what fraction of the loaded dataset should be used for training v.s. evaluation
eval-split: specifies what fraction of the laoded dataset should be used for evaluation v.s. training
epochs: total number of passes over the all batches of the training set
lr: learning rate for cross-entropy loss of RT1
train-batch-size: the number of trajectories from which to sample data for the current training batch
eval-batch-size: the number of trajectories from which to sample data for the current evaluation batch
trajectory-length: the window size (context history of trajecotry-length previous images) used for each trajectory when feeding data to RT-1 model; this is set to 6 based on the RT-1 implementation
sentence-transformer: the language embedding to apply on the language-specified task
device: the device to load the model/data onto during training/inference
eval-freq: the interval of batches at which to run evaluation/inference on the validation dataset (currently set to 0 in main_ft.py)
checkpoint-freq: the interval of batches at which to save a checkpoint during training
checkpoint-dir: the directory path at which to save a checkpoint during training
load-checkpoint: (optional) path of the pretrained checkpoint to load for further fine-tuning
wandb: boolean determining if logging to weights and biases should happen
eval-scene: the AI2Thor scene number in the dataset that is held out of the training set for evaluation during k-fold cross validation across scenes
split-type: determines the split type (i.e. k-fold by scene, task wise or diversity ablation) between train and evaluation used by the DatasetManager in rt1_dataloader.py
num-diversity-scenes: only if split-type is diversity-ablation, this is used to determine the total number of scenes to perform diversity ablation over i.e. maximum of 4 for LaNMP simulation data
max-diversity-trajectories: only if split-type is diversity-ablation, this is used to determine the total number of trajectories that are divided evenly across the number of num-diversity-scenes scenes
train-subbatch: the batch size to use during training/finetuning
eval-subbatch: the batch size to use during evaluation

Checkpoint samples

Please find the follow checkpoints samples that can be loaded to the RT-1 model. These can be found on the supplementary Google Drive associated with this project

sample_checkpoints/pretrained_bridge: the final checkpoint saved when pretraining the RT-1 model on the Bridge dataset
sample_checkpoints/task_gen: the final checkpoint saved after finetuning RT-1 model on the task-wise split for the task generalization experiment
sample_checkpoints/kfold_cross_val: the final checkpoints saved after finetuning RT-1 model using k-fold cross validations where each fold represented a held out scene from AI2Thor

Additional notes

When running any of the finetuning or pretraining scripts, please ensure the following modules are loaded module load cuda/11.8.0-lpttyok module load cudnn/8.7.0.84-11.8-lg2dpd5

Preliminary

Create a Python virtual environment using Python 3.9.16 using python3.9 -m venv rt1_env
Activate the virtual environment using source rt1_env/bin/activate
Install and load the CUDA Toolkit 11.8.0 and cuDNN 8.7.0
cd LaNMP-Dataset/models/main_models/rt1
Load necessary libraries using pip install -e . or directly activate the saved rt1_env folder using source rt1_env/bin/activate (if Python 3.9 is loaded onto your system)

Running Pre-Training

cd LaNMP-Dataset/models/main_models/rt1
Open main.py and modify the load-checkpoint argument to None (since we are pretraining from initialization)
Ensure the checkpoint-dir argument is a known and valid local path (where checkpoints during pretraining will be saved at the checkpoint-freq)
Set all other arguments in `main.py'
Navigate to LaNMP-Dataset/models/main_models/rt1/rt1_pytorch/tokenizers/action_tokenizer.py
Ensure the action_order and action_space in lines 61 and 62 of action_tokenizer.py fetch from bridge_keys defined in line 56
Run python3 main.py with all arguments input as required
Checkpoints for pretraining should be saved chronologically (by step number) in the checkpoint-dir directory

Running Fine-Tuning

cd LaNMP-Dataset/models/main_models/rt1
Open main_ft.py and modify the load-checkpoint argument to the checkpoint path generated from pretraining or the path where the pretrained checkpoint (from Google Drive) is saved
Ensure the checkpoint-dir argument is a known and valid local path (where checkpoints during finetuning will be saved at the checkpoint-freq)
Set all other arguments in main_ft.py' (particularly split-type` defines the type of experiment to be run i.e. k-fold across scenes, task generalization or diversity ablations)
Navigate to LaNMP-Dataset/models/main_models/rt1/rt1_pytorch/tokenizers/action_tokenizer.py
Ensure the action_order and action_space in lines 61 and 62 of action_tokenizer.py fetch from lanmp_keys defined in line 56
Run python3 main_ft.py with all arguments input as required
Checkpoints for pretraining should be saved chronologically (by step number) in the checkpoint-dir directory

Running Inference (on AI2Thor)

cd LaNMP-Dataset/models/main_models/rt1
Open main_ft_eval.py and modify the checkpoint-path argument to the checkpoint path from pretraining, finetuning or one of the pre-saved checkpoints (from Google Drive)
Set all other arguments in main_ft_eval.py' (particularly split-type` defines the type of experiment to be run i.e. k-fold across scenes, task generalization or diversity ablations)
Navigate to LaNMP-Dataset/models/main_models/rt1/rt1_pytorch/tokenizers/action_tokenizer.py
Ensure the action_order and action_space in lines 61 and 62 of action_tokenizer.py fetch from lanmp_keys defined in line 56
Run python3 main_ft_eval.py with all arguments input as required
Evaluation loss logs should be reported on weights and biases as well as printed (mean ± std dev) on the terminal

ALFRED Seq2Seq

The ALFRED Seq2Seq model from the paper "ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks" by Shridhar et al. was modified and fine-tuned on LaNMP. This model was trained and ran on an NVIDIA 3090 GPU, so some of the following instructions assume the use of that GPU.

Preliminary:

Create a Python virtual environment using Python 3.9: python3.9 -m venv alfred-env
Activate the virtual environment source alfred-env/bin/activate
Install and load CUDA Toolkit 11.8 and cuDNN 8.7
cd LaNMP-Dataset/models/main_models
export ALFRED_ROOT=$(pwd)/alfred
cd alfred
Install all dependencies: pip install -r requirements.txt
Download the dataset from the DropBox
Place the zipped dataset files in LaNMP-Dataset/dataset
Unzip the datasets gunzip *.gz

Running training:

The original pretrained model used for fine-tuning can be downloaded from this Google Drive Folder.

Place the model in LaNMP-Dataset/models/main_models/alfred/pretrained
cd LaNMP-Dataset/models/main_models/alfred
Extract the image features using the ResNet and save them to disk:

python models/utils/extract_resnet.py --gpu

Fine-tune:

python models/train/train_seq2seq.py --model seq2seq_im_mask --dout exp/model:{model}_discrete_relative_fold1 --gpu --batch 8 --pm_aux_loss_wt 0.1 --subgoal_aux_loss_wt 0.1 --pp_data 'data/feats_discrete_relative_fold1' --split_keys 'data/splits/split_keys_discrete_relative_fold1.json --class_mode --relative --preprocess'

--class_mode puts the model into classification mode to use cross-entropy loss and output discrete actions
--relative makes the model produce relative (delta between current step and next step) actions rather than global actions
--preprocess preprocesses the data and saves it on disk to be used for the training down the pipeline. This only needs to be ran once. It can be removed after the first time to only run the training.
More details on all the command-line arguments can be found at LaNMP-Dataset/models/main_models/train/train_seq2seq.py

Running inference:

The simulated fine-tuned models can be downloaded from this Google Drive folder.

The simulated extracted ResNet visual features can be downloaded from this Google Drive folder.

Place the model pth files in LaNMP-Dataset/models/main_models/alfred/exp
Place the zipped vision features file in LaNMP-Dataset/models/main_models/alfred/data/vis_feats
Unzip and extract the file tar -xzvf vis_feats.tar.gz
cd LaNMP-Dataset/models/main_models/alfred
Run inference using fold1's fine-tuned model:

python models/eval/eval_seq2seq.py --model_path exp/best_test_fold1.pth --gpu --model models.model.seq2seq_im_mask --pp_data data/feats_discrete_relative_fold1 --split_keys 'data/splits/split_keys_discrete_relative_fold1.json'

The command assumes it is run on a machine with a GUI in order to run the AI2THOR simulator, i.e. not on a headless machine.
To run other models instead of the "fold1" model, change any part that has "fold1" in the command to the desired model, e.g. "task" for the "best_test_task.pth" model.
More details on all the command-line arguments can be found at LaNMP-Dataset/models/main_models/eval/eval_seq2seq.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LaNPM Dataset Benchmark

Dataset Format

Sim Dataset

Real Dataset

Running Data Collection

Simulation (AI2THOR)

Real (Spot)

RT-1

Setup Instructions

Overview of files

Details about file arguments

Checkpoint samples

Additional notes

Preliminary

Running Pre-Training

Running Fine-Tuning

Running Inference (on AI2Thor)

ALFRED Seq2Seq

About

Releases

Packages

Contributors 5

Languages

License

h2r/LaNPM-Dataset

Folders and files

Latest commit

History

Repository files navigation

LaNPM Dataset Benchmark

Dataset Format

Sim Dataset

Real Dataset

Running Data Collection

Simulation (AI2THOR)

Real (Spot)

RT-1

Setup Instructions

Overview of files

Details about file arguments

Checkpoint samples

Additional notes

Preliminary

Running Pre-Training

Running Fine-Tuning

Running Inference (on AI2Thor)

ALFRED Seq2Seq

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages