Update: Watch 50:57 - 51:55 for Demo with Comparison on Spot
Click image below to watch video on Youtube.
Kaiyu Zheng,
Deniz Bayazit,
Rebecca Mathew,
Ellie Pavlick,
Stefanie Tellex
Department of Computer Science, Brown University
30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2021
Website: https://h2r.github.io/sloop/
Github repository: https://github.com/h2r/sloop
ArXiv: https://arxiv.org/abs/2012.02705
Humans use spatial language to naturally describe object locations and their relations. Interpreting spatial language not only adds a perceptual modality for robots, but also reduces the barrier of interfacing with humans. Previous work primarily considers spatial language as goal specification for instruction following tasks in fully observable domains, often paired with reference paths for reward-based learning. However, spatial language is inherently subjective and potentially ambiguous or misleading. Hence, in this paper, we consider spatial language as a form of stochastic observation. We propose SLOOP (Spatial Language Object-Oriented POMDP), a new framework for partially observable decision making with a probabilistic observation model for spatial language. We apply SLOOP to object search in city-scale environments. To interpret ambiguous, context-dependent prepositions (e.g. front), we design a simple convolutional neural network that predicts the language provider's latent frame of reference (FoR) given the environment context. Search strategies are computed via an online POMDP planner based on Monte Carlo Tree Search. Evaluation based on crowdsourced language data, collected over areas of five cities in OpenStreetMap, shows that our approach achieves faster search and higher success rate compared to baselines, with a wider margin as the spatial language becomes more complex. Finally, we demonstrate the proposed method in AirSim, a realistic simulator where a drone is tasked to find cars in a neighborhood environment.
- Installation
- Download Dataset & Models
- Download Results & Reproduce
- OpenStreetMap Demo
- AirSim Demo
- Citation
The required python version is Python 3.6+.
-
Clone the repository and create and virtual environment with the following lines.
git clone [email protected]:h2r/sloop.git cd sloop; mkdir -p venv/sloop virtualenv -p python3 venv/sloop source venv/sloop/bin/activate
For the last step, you can also do
source setup.bash
for convenience. -
Install pomdp-py
pip install pomdp-py==1.2.4.5
-
Install the
sloop
package. Assume you're at the root of the sloop repository.pip install -e .
Note that this will install a number of dependencies. See
setup.py
for the list of packages. The>=
symbol assumes backwards compatibility of those packages. -
Download spaCy model. For dependency parsing, we use the
en_core_web_md
model from spaCy. You can download it by:python -m spacy download en_core_web_md
There is one dataset and two models.
-
The dataset contains OpenStreetMap data and AMT spatial language descriptions and annotations. Download the dataset from here (SL_OSM_Dataset, 3.6MB), and place it under
sloop/datasets
and extract there.After extraction your directory structure should look like:
``` / # repository root sloop/ ... datasets/ SL_OSM_Dataset/ amt/ frame_of_ref/ ... ```
Check out this wiki page for documentation about the dataset.
-
The models are the frame of reference prediction models. There is a front model (for front and behind) and a left model (for left and right). Download the models from here (models.zip, 42.4MB) and place it under
sloop/oopomdp/experiments/resources
(you will need to create a directory called 'resources').After extraction your directory structure should look like:
/ # repository root sloop/ ... oopomdp/ experiments/ resources/ models/ iter2_ego-ctx-foref-angle:front:austin ...
You can download the full results (including individual trial results) from here
(all-joint-sloop.zip, 2.0GB), and place it under sloop/
After extraction your directory structure should look like:
```
/ # repository root
sloop/
results/
all-joint-sloop/
langprior-austin_00_informed#5-austin-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-na/
...
```
-
Install sciex, a experiment framework, with
pip install sciex
-
cd sloop/results/all-joint-sloop
-
python gather_results.py
Expected output
... Warning: <class 'sloop.oopomdp.experiments.reward_result.RewardsResult'> result file rewards.yaml not found in /media/kz-wd-ssd/repo/sloop/results/all-joint-sloop/langprior-austin_380_rule#based#ego>ctx>foref>angle-austin-laser:fov=90:min*range=1:max*range=4:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle Collected results in langprior-austin_380_rule#based#ego>ctx>foref>angle-austin-laser:fov=90:min*range=1:max*range=4:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle Collected results in langprior-cleveland_41_mixture#full#auto-cleveland-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle Collected results in langprior-honolulu_19_informed#5-honolulu-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-na Collected results in langprior-honolulu_35_keyword-honolulu-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-na Collected results in langprior-washington,dc_33_keyword-washington,dc-laser:fov=90:min*range=1:max*range=4:angle*increment=0.5:occlusion*enabled=False-na Warning: <class 'sloop.oopomdp.experiments.states_result.StatesResult'> result file states.pkl not found in /media/kz-wd-ssd/repo/sloop/results/all-joint-sloop/langprior-austin_380_rule#based#ego>ctx>foref>angle-austin-laser:fov=90:min*range=1:max*range=4:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle Collected results in langprior-cleveland_39_keyword#auto-cleveland-laser:fov=90:min*range=1:max*range=5:angle*increment=0.5:occlusion*enabled=False-na Collected results in langprior-denver_29_informed#5-denver-laser:fov=90:min*range=1:max*range=4:angle*increment=0.5:occlusion*enabled=False-na Collected results in langprio ...
This will generate many plots and results in png/csv/yaml/json formats.
-
python ../refine.py ./
This will generate some more tables that are better organized and readable,
using the files generated from the last step as input.
cd sloop/results/all-joint-sloop
- Notice there are several scripts
run_{1,2,3,4}.sh
. Each contains commands to run individual trials. You can break them up further if you want more parallelization (you can automatically break them up using thegenerate_run_scripts.py
and increase the value for the variableSPLIT
. - Now, run the experiment by running
./run_#.sh
wher#
is the index of the run file. Note that because the experiment is random, the final result may differ slightly from those reported in the paper.
You can now start a demo of spatial language object search on an OpenStreetMap by running
cd sloop/oopomdp/experiments
python interface.py
This starts a terminal interface. We will walk through an example below.
At start, the program loads several things:
$ python interface.py
pygame 2.0.1 (SDL 2.0.14, Python 3.8.8)
Hello from the pygame community. https://www.pygame.org/contribute.html
Loading spacy model...
Loading spatial keywords...
Loading symbol to synonyms...
Then, it asks for a city name. You can enter any one from austin, cleveland, denver, honolulu, washington_dc
.
We will enter austin
.
map name: austin
Loading mapinfo
Then, it asks for number of objects (maximum 3). We will enter 1:
num objects, max 3 [2]: 1
Then, sensor range; The is the depth of the fan-shaped sensor with fixed field of view angle of 90 degrees. Sensor range of 3 refers to a range of 15m, 4 is 20m, 5 is 25m.
Sensor range [4]: 3
You can enter the true x, y location of the target object (a red car). Leave it blank for random.
x, y for object R [random]:
Then, a window pops up. The blue circle is the robot. Its starting location is random.
Then, You are asked to enter a spatial language description of the target's location.
Hint: the red car is by the lavaca street behind HiLo.
Now you can enter the method that essentially interprets the language as a prior belief over the target location,
through one belief update step. The choices are mixture
, sloop
, keyword
, informed
, uniform
.
They correspond to SLOOP(m=4), SLOOP, MOS(keyword), informed, uniform in our experiments.
Prior type [mixture]:
Loading ego_ctx_foref_angle model for right
...
Then the program processes the language, and then the robot starts to search. You will see output similar to this
...
Language: "the red car is by the lavaca street behind HiLo."
MAP: austin
{'entities': ['RedHonda', 'HiLo', 'LavacaSt'], 'relations': [('RedHonda', 'LavacaSt', None), ('RedHonda', 'HiLo', 'behind')], 'lang': 'the RedHonda is by LavacaSt behind HiLo.'}
2021-07-07 13:49:44.438213 Event (Normal): Trial interface-mixture_mixture-austin-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle | Step 1: action: move-vw-TurnRight reward: -11.000 cum_reward: -11.000 NumSims: 300 PlanTime: 0.61667
2021-07-07 13:49:45.094287 Event (Normal): Trial interface-mixture_mixture-austin-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle | Step 2: action: move-vw-TurnRight reward: -11.000 cum_reward: -22.000 NumSims: 300 PlanTime: 0.61652
2021-07-07 13:49:45.781581 Event (Normal): Trial interface-mixture_mixture-austin-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle | Step 3: action: move-vw-TurnRight reward: -11.000 cum_reward: -33.000 NumSims: 300 PlanTime: 0.64700
2021-07-07 13:49:46.453592 Event (Normal): Trial interface-mixture_mixture-austin-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle | Step 4: action: move-vw-Forward reward: -11.000 cum_reward: -44.000 NumSims: 300 PlanTime: 0.63122
2021-07-07 13:49:47.136918 Event (Normal): Trial interface-mixture_mixture-austin-laser:fov=90:min*range=1:max*range=3:angle*increment=0.5:occlusion*enabled=False-ego>ctx>foref>angle | Step 5: action: find reward: 1000.000 cum_reward: 956.000 NumSims: 300 PlanTime: 0.65474
2021-07-07 13:49:47.164189 Event (Normal): Trial %s | Task Finished!
...
and the window now may look something like this:
This requires installation of Unreal Engine 4 and AirSim, following AirSim's documentation. (Linux, Windows), as well as downloading assets for the UE4 environment e.g. Neighborhoods and Urban City. Detailed instructions can be found in this wiki page.
The search trial is started by running python controller.py
.
Here is an example screenshot:
@inproceedings{sloop-roman-2020,
title={Spatial Language Understanding for Object Search in Partially Observed Cityscale Environments},
author={Zheng, Kaiyu and Bayazit, Deniz and Mathew, Rebecca and Pavlick, Ellie and Tellex, Stefanie},
booktitle={2021 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)},
year={2021},
organization={IEEE}
}