LUNA is a generative AI model that reconstructs tissues conditioned solely on gene expressions of cells by learning spatial priors over existing spatially resolved datasets. During training, LUNA learns spatial priors over existing spatial transcriptomics data. At inference stage, LUNA generates complex tissue structures solely from gene expressions of dissociated cells. LUNA is written in PyTorch.
Input:
- A gene expression matrix along with the corresponding cell coordinates from spatial transcriptomics data, accompanied by section information. The files should be in
.csv
format, and is used for model training. - A gene expression matrix for single cells lacking spatial information, accompanied ideally by cell class annotations (for visualization purpose). This matrix should contain the same number of genes as the training dataset. The files should be in
.csv
format, and is used for model inference.
Note: The two matrices should be derived from the same anatomical region or tissue type and share a common set of genes.
Output:
- The generated 2D spatial coordinates of cells, based on their gene expression data, provided in
.csv
format.
To effectively train LUNA, organize your input .csv
files in the following format:
- Rows represent cells.
- Columns represent features, detailed as follows:
- 2D Coordinates: Use
'coord_X'
and'coord_Y'
for spatial coordinates of cells. For cells without spatial information (i.e., test set), use zeros. - Section Information (
cell_section
): This column should specify the section cells are sourced from. Cells from the same section (slice) with be grouped as one input sample. - Gene Expression Matrix: Include a preprocessed cell-by-gene matrix, preferably normalized using log2 transformation.
- Cell Annotation (
cell_class
): Use this to categorize cells, aiding in the evaluation and visualization of generated results.
- 2D Coordinates: Use
To begin, clone the LUNA repository from GitHub:
git clone https://github.com/mlbio-epfl/luna.git
Create the conda environment:
conda create -n LUNA python=3.9 numpy pandas
conda activate LUNA
Install cuda, pytorch, torch-geometric, lightning and other pip libraries:
conda install nvidia/label/cuda-11.8.0::cuda-toolkit
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install torch_geometric
pip install lightning
pip install scanpy wandb colorcet squidpy hydra-core linear_attention_transformer
/example/MERFISH_mouse_cortex.ipynb
has an example of running LUNA, and results evaluation.
To use LUNA, begin by adjusting the settings in the configuration file located at /configs/experiment
. This file, which leverages Hydra for managing configurations, contains essential parameters like the experiment name, dataset paths, and training/testing splits. Run the main.py
file to start the experiment. Here is a breakdown of the critical elements in the configuration file:
dataset_name
: Specifies the name of the dataset you are utilizing.train_data_path
: Provides the path to your train dataset’s.csv
file.test_data_path
: Provides the path to your inference dataset’s.csv
file.gene_columns_start
andgene_columns_end
: Define the columns where gene expression data begins and ends within your dataset (train dataset and inference dataset should have the same number of genes and gene columns should be ordered the same).
save_dir
: Directory to save test results. Use'./'
to save in the current codebase directory.
Once your configuration is ready, execute the script. Simply change the experiment
value in /configs/config.yaml
to point to your updated configuration file, and LUNA will be ready to run by
python main.py
We provide a sample dataset from the MERFISH Mouse Primary Motor Cortex Atlas. To use this dataset with LUNA, download it to your local machine, you can either follow the instruction in /example/MERFISH_mouse_cortex.ipynb
file OR simply update the data_path
in the configuration file to reflect this dataset's location, and execute main.py
to run LUNA on this dataset.
We also offer a detailed tutorial on using the test_only
mode to load checkpoints from your trained LUNA model and conduct tests. You can find the step-by-step instructions in the notebook located at /example/MERFISH_mouse_cortex_test_only.ipynb
.
The preprocessed datasets used for the experiments presented in our manuscript are available for access here.
If you find LUNA useful, please consider citing:
@article{yu2025luna,
title={Tissue reassembly with generative AI},
author={Yu, Tingyang and Ekbote, Chanakya and Morozov, Nikita and Fan,
Jiashuo and Frossard, Pascal and D'Ascoli, Stephane and Brbic, Maria},
journal={biorxiv},
year={2025},
}
-
If you are not familiar with Weights & Biases (wandb) and wish to disable it during the model training, especially if you encounter error messages while using wandb, you can easily do so. Simply add the flag
general.wandb=disabled
to your command to disable wandb integration. -
If you encounter the following error while computing the RSSD metric:
File "/luna/metrics/evaluation_statistics.py", line 43, in compute_kabsch_rotation
rot, rssd, sens = R.align_vectors(
File "_rotation.pyx", line 3420, in scipy.spatial.transform._rotation.Rotation.align_vectors
scipy.spatial.transform._rotation.Rotation.align_vectors
ValueError: Cannot return sensitivity matrix with an infinite weight or one vector pair
This error is typically associated with an incompatible version of SciPy. We recommend ensuring that you are using SciPy version 1.9.1. This version has been verified to work correctly with our tools, whereas other versions may lead to compatibility issues.