GitHub - mlbio-epfl/LUNA: Tissue reassembly with generative AI

LUNA is a generative AI model that reconstructs tissues conditioned solely on gene expressions of cells by learning spatial priors over existing spatially resolved datasets. During training, LUNA learns spatial priors over existing spatial transcriptomics data. At inference stage, LUNA generates complex tissue structures solely from gene expressions of dissociated cells. LUNA is written in PyTorch.

Project website

Input:

A gene expression matrix along with the corresponding cell coordinates from spatial transcriptomics data, accompanied by section information. The files should be in .csv format, and is used for model training.
A gene expression matrix for single cells lacking spatial information, accompanied ideally by cell class annotations (for visualization purpose). This matrix should contain the same number of genes as the training dataset. The files should be in .csv format, and is used for model inference.

Note: The two matrices should be derived from the same anatomical region or tissue type and share a common set of genes.

Output:

The generated 2D spatial coordinates of cells, based on their gene expression data, provided in .csv format.

Setting up LUNA

Prepare the Dataset

To effectively train LUNA, organize your input .csv files in the following format:

Rows represent cells.
Columns represent features, detailed as follows:
- 2D Coordinates: Use 'coord_X' and 'coord_Y' for spatial coordinates of cells. For cells without spatial information (i.e., test set), use zeros.
- Section Information (cell_section): This column should specify the section cells are sourced from. Cells from the same section (slice) with be grouped as one input sample.
- Gene Expression Matrix: Include a preprocessed cell-by-gene matrix, preferably normalized using log2 transformation.
- Cell Annotation (cell_class): Use this to categorize cells, aiding in the evaluation and visualization of generated results.

Installation Requirements

To begin, clone the LUNA repository from GitHub:

git clone https://github.com/mlbio-epfl/luna.git

Create the conda environment:

conda create -n LUNA python=3.9 numpy pandas
conda activate LUNA

Install cuda, pytorch, torch-geometric, lightning and other pip libraries:

conda install nvidia/label/cuda-11.8.0::cuda-toolkit
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install torch_geometric
pip install lightning
pip install scanpy wandb colorcet squidpy hydra-core linear_attention_transformer

Generating Tissue Structure Using LUNA

/example/MERFISH_mouse_cortex.ipynb has an example of running LUNA, and results evaluation.

Configuration

To use LUNA, begin by adjusting the settings in the configuration file located at /configs/experiment. This file, which leverages Hydra for managing configurations, contains essential parameters like the experiment name, dataset paths, and training/testing splits. Run the main.py file to start the experiment. Here is a breakdown of the critical elements in the configuration file:

Dataset

dataset_name: Specifies the name of the dataset you are utilizing.
train_data_path: Provides the path to your train dataset’s .csv file.
test_data_path: Provides the path to your inference dataset’s .csv file.
gene_columns_start and gene_columns_end: Define the columns where gene expression data begins and ends within your dataset (train dataset and inference dataset should have the same number of genes and gene columns should be ordered the same).

Test

save_dir: Directory to save test results. Use './' to save in the current codebase directory.

Once your configuration is ready, execute the script. Simply change the experiment value in /configs/config.yaml to point to your updated configuration file, and LUNA will be ready to run by

python main.py

Example Usage

We provide a sample dataset from the MERFISH Mouse Primary Motor Cortex Atlas. To use this dataset with LUNA, download it to your local machine, you can either follow the instruction in /example/MERFISH_mouse_cortex.ipynb file OR simply update the data_path in the configuration file to reflect this dataset's location, and execute main.py to run LUNA on this dataset.

Citing

If you find LUNA useful, please consider citing:

@article{yu2025luna,
  title={Tissue reassembly with generative AI},
  author={Yu, Tingyang and Ekbote, Chanakya and Morozov, Nikita and Fan, 
          Jiashuo and Frossard, Pascal and D'Ascoli, Stephane and Brbic, Maria},
  journal={biorxiv},
  year={2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
configs		configs
datasets		datasets
example		example
image		image
metrics		metrics
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
diffusion_model.py		diffusion_model.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setting up LUNA

Prepare the Dataset

Installation Requirements

Generating Tissue Structure Using LUNA

Configuration

Dataset

Test

Example Usage

Citing

About

Releases

Packages

Contributors 2

Languages

mlbio-epfl/LUNA

Folders and files

Latest commit

History

Repository files navigation

Setting up LUNA

Prepare the Dataset

Installation Requirements

Generating Tissue Structure Using LUNA

Configuration

Dataset

Test

Example Usage

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages