- Genomic decomposition and reconstruction of non-tumor diploid subclones (2023)
- CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting
- Support multiple diploid sample
- Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.
- python 3.6.x
- matplotlib 3.5.2
- seaborn 0.11.2
- numpy 1.21.5
- pandas 1.3.4
- scikit-learn 1.0.2
- scipy 1.7.3
- palettable 3.3.0
-
git clone https://github.com/Yonsei-TGIL/CLEMENT.git
cd CLEMENT
pip3 install . -
pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git
- pip3 install CLEMENTDNA
1.0.11 (Jan 1st, 2024)
As now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in "example" directory.
- 1st column: mutation ID (CHR_POS is recommended)
- 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0
- 3rd column: Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n * should be comma-separated, and no space permitted
- 4th column: BQ1,BQ2....,BQ_n * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.
CLEMENT [OPTIONS]
(Mandatory) These options are regarding User's input and output format
--INPUT_TSV Input data whether TSV. The tool automatically detects the number of samples
--CLEMENT_DIR Directory where the outputs of CLEMENT be saved
These options are regarding downsizing User's input or not
--RANDOM_PICK Set this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).
These options are adjusting E-M algorithm parameter
--NUM_CLONE_TRIAL_START Minimum number of expected cluster_hards (initation of K) (default: 3)
--NUM_CLONE_TRIAL_END Maximum number of expected cluster_hards (termination of K) (default: 5)
--TRIAL_NO Trial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)
--FP_PRIOR FP_PRIOR Prior of false positive (FP). Recommendation : <= 0.1. (default : 0.01)
--TN_PRIOR TN_PRIOR Prior of true negative (TN). Recommendation : > 0.99. (default : 0.99)
--KMEANS_CLUSTERNO Number of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)
--MIN_CLUSTER_SIZE The minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number (default: 9)
Other options
--MODE Selection of clustering method. "Hard": hard clustering only, "Both": both hard and soft (fuzzy) clustering (default: "Both")
--MAKEONE_STRICT 1: strict, 2: lenient, 3: most lenient (default : 1)
--SCORING True : comparing with the answer set, False : just visualization (default: False)
Miscelleneous
--FONT_FAMILY Font family that displayed in the plots (default : "arial")
--VISUALIZATION Whether produce image in every E-M step (default: True)
--IMAGE_FORMAT Image format that displayed in the plots (default : jpg)
--VERBOSE 0: no record, 1: simplified record, 2: verbose record (default: 2)
${CLEMENT_DIR}"/result"
- CLEMENT_decision CLEMENT's best recommendation among hard and soft clustering.
- CLEMENT_hard_1st CLEMENT's best decomposition by hard clustering.
- CLEMENT_hard.gapstatistics.txt Selecting the optimal K in hard clustering based on gap* stastics.
- CLEMENT_soft_1st CLEMENT's best decomposition by soft (fuzzy) clustering.
- membership.txt Membership assignment of all variants to each clusters.
- membership_count.txt Count matrix of the membership assignment to each clusters.
- mixture.txt Centroid of each clusters
DIR=[YOUR_DIRECTORY]
# Example 1
CLEMENT \
--INPUT_TSV ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1/1.txt" \
--CLEMENT_DIR ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1" \
--NUM_CLONE_TRIAL_START 1 \
--NUM_CLONE_TRIAL_END 5
# Example 2
CLEMENT \
--INPUT_TSV ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4/M1-8_M2-4_input.txt" \
--CLEMENT_DIR ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4" \
--NUM_CLONE_TRIAL_START 2 \
--NUM_CLONE_TRIAL_END 6 \
--RANDOM_PICK 500