community
is an R package designed to explore the differences in communication between various case and control samples using single-cell RNA sequencing (scRNAseq). With its user-friendly output, community
lets you easily follow the overall shifts in communication within the cohorts. You can visualize and delve into the most significant differences in interactions, and even investigate what's driving these changes. It's a handy tool for anyone interested in a deeper understanding of cell-to-cell communication.
In this repository, you'll find:
- Guidelines on how to leverage the
community
tool effectively - Methods to visualize output results on preprocessed data
- Insights into the construction of our database
- Instructions for preprocessing raw data across multiple datasets
-
Differential Communication Analysis: One of the key capabilities of the "community" is its ability to perform differential communication analysis between cohorts of case and control samples using scRNAseq data. This feature enables the identification of communication differences that may underlie various biological conditions or disease states.
-
Compensatory Mechanism Analysis: Another important aspect offered by the tool is the ability to analyze compensatory mechanisms. Compensatory mechanisms refer to the ways in which cells or organisms compensate for changes or disruptions in a biological process.
-
Speed and Efficiency:
community
is built for speed, making it suitable for large-scale scRNAseq datasets without high computational demands. -
Robustness: Our tool employs rigorous statistical methods along with multi-factor analysis to ensure the reliability of the results. This makes it less sensitive to outlier samples, providing more accurate differential communication analysis.
-
Intuitive Visualization: The tool offers straightforward graphical outputs, simplifying the understanding of complex cell interactions.
-
User-Friendly: With an easy-to-use workflow and detailed documentation, the tool is designed for users of varying technical backgrounds.
If you are not fimilar with the below code, please follow the next steps to install. We have created a script for Windows users to automatically install all the necesary components (install_windows.bat
).
# install.packages("devtools")
devtools::install_github("SoloveyMaria/community")
To create an environment named 'community', and install Jupyter Notebook with the R kernel as well as devtools and all the other dependencies, follow the steps below. All the commands below run in terminal/cmd.
If you have git
installed on your system, you can directly clone the repository by running the following command in your command prompt or Git Bash:
git clone https://github.com/SoloveyMaria/community.git
Note: You need to have Git installed on your system. If Git is not installed, please download this repository in zip format, decompress it, navigate to the directory/folder. Here is the link to the zip file for this repository. Click here to download
Once you navigated to cloned/downloaded folder of the repository (C:\Users\UserName\Downloads\community)
, you can use the provided install_windows
script (you might need administrative rights) to automate the setup process. You can run this sctips with a double click. Running this script will download and install Conda if it is not already installed on your system. Afterwards, it will create a new Conda environment and install all the dependencies required for the community tool. Upon completion, Jupyter Notebook will open automatically. If you already have all the necessary dependencies installed, the script will only launch Jupyter Notebook.
Please ensure that you have the necessary permissions to run the install_windows
script if you choose to use it.
- Clone the repo with the below command. This will create a folder named "community" in your current directory.
Note: Please download and extract the file in here if you do not have Git
installed on your system. Click here to download
```
git clone https://github.com/SoloveyMaria/community.git
```
-
Navigate to the
community
filder by executing the following command.cd community/
-
If you don't have conda installed yet, install conda by running the command below
make install-conda
-
Create a conda environment named "community" and install all necessary packages by using the following command:
make create-env
-
Launch Jupyter to access the notebooks to generate graphs
make run-jupyter
-
Go to http://localhost:8888 (a page should open automatically in your browser)
or
-
Open:
src/calculate_communication.ipynb
Notebook to run the demo workflow.
We have converted the human gene symbols in the database, using their entrez ids and the MGI mouse gene symbols and entrez ids of the one-one orthologs as determined via NCBI's homologene db and biomaRt ensembl db.
This database can be loaded as data(LR_database_mouse)
, additionally it is also provided as .csv file.
In community
, database is managed through a comma-separated file, requiring three mandatory columns(Pair.Name
, Ligand
and Receptor
). Users can easily incorporate their own pairs by editing the provided .csv file. For instructions on updating the database or customization using our workflow, detailed information is available here.
Due to GitHub's file size limitations, if you are interested in using the precomputed results to reproduce the figures from the paper, you can easily download the data, including the weight matrix and interaction annotation table (also available as R objects), from the following Google Drive link:
You can download the preprocessed data by running the below command. You can also visit the link here https://zenodo.org/record/10619771 and download manually. In order to run the notebook workflow with the preprocessed data, the files should be under the /docs/showcase_notebooks directory.
- Download preprocessed data by running the following code on the terminal. Windows users can double click on
download_data.bat
to download all the preprocessed data to run the showcase notebooks. The downloaded data will be located atdocs/showcase_notebooks/$dataset/input_files/
- to download Lasry dataset directly into the corresponding directory, you can runmake download-lasry
or you can simply visit the zenodo page here to download manually and place them into the related directory. - For simillie,make download-simillie
or visit zenodo link. - For integrated VanGalen-Oetjen,make download-vangalen_oetjen
or visit zenodo link.
For each dataset, we provide tailored preprocessing workflows. These workflows encompass:
- Initial Data Cleaning and Annotation
- Filtering
- Normalization
- Batch Correction (if applicable)
- Data Visualization
For a comprehensive overview of our preprocessing steps, please follow this link.
To learn using community
, read one of the following vignettes:
Following vignette contains the explanation on how to perform a basic communication, QC, and differential communication analysis. This includes guidelines on data preprocessing, running the main analysis functions, and interpreting the results. This analysis takes only a few minutes to run:
If you want to make a comprehensive plots visualization of the community output, you can check following vignettes:
Q: I have X number of samples in a Seurat object. Which data slot should I use as the input count matrix?
A: If you're working with a Seurat object, you have multiple options for your input count matrix:
- Normalized Counts: The preferred choice, especially for setting meaningful expression levels (
threshold_expr
). - Log-Normalized Counts: A viable alternative, but requires extra caution with
threshold_expr
. - Integrated Counts: Can be used but may introduce artifacts, particularly with lowly expressed genes. Ideal if you're dealing with significant batch effects.
Q: My scRNA dataset has limited cell type diversity. Is it suitable for Community analysis?
A: Absolutely, community
analysis is not solely dependent on the size of each cell type population. Here's what you should consider:
- Presence Across Samples: A cell type should ideally be present in almost all samples. Missing in one or two samples is generally acceptable.
- Minimum Cell Count: Decide on a threshold for the minimum number of cells of a particular type in a sample. The default in Community is six cells per type per sample.
For more details on filtering based on these criteria that we used, refer to the Data Pre-Processing Notebook.
Q: Do I need to differentiate between healthy and malignant B cells when annotating my samples?
A: No special annotation is required for malignant B cells. You can categorize them under the general "B cell" type. The algorithm will distinguish between the two populations through the "active fraction" component, effectively capturing the nuances.
We welcome contributions from the community! If you're interested in contributing to the project, there are a couple of ways you can get involved:
-
Fork and Merge: Feel free to fork the repository, make your changes, and then submit a pull request to merge your changes back into the main branch.
-
Open an Issue: If you have ideas for improvements, feature requests, or encounter any bugs, please open an issue on the repository. This helps us keep track of community feedback and prioritize development efforts.
Citation will be added soon.