questions about the output #140

ZYongQi · 2024-01-25T07:13:44Z

Hi,thisi is ZY.I sued freec to call CNVs in the genome successfully.But still two questions:

the output like this:
ID=gene-POFUT2 1 1255210 1274664 1264440 1302816 0 loss
ID=gene-DYRK1A 1 7417053 7572985 7516776 7556136 8 gain
ID=gene-TTC3 1 7693717 7749619 7699800 7826736 10 gain
ID=gene-LOC117795648 1 7741425 7741527 7699800 7826736 10 gain
ID=gene-LOC117801378 1 7751983 7752392 7699800 7826736 10 gain
ID=gene-LOC100480655 1 7791807 7792944 7699800 7826736 10 gain
ID=gene-LOC117801382 1 7795812 7796932 7699800 7826736 10 gain
ID=gene-LOC117801055 1 7806381 7811440 7699800 7826736 10 gain
ID=gene-LOC117801383 1 7820518 7824867 7699800 7826736 10 gain
ID=gene-HLCS 1 7834905 8035784 7925136 7958592 0 loss
ID=gene-LOC117801776 1 44252319 44863795 44619480 44641128 0 loss

it contains predicted copy number.I wonder what it refers if this value equals 0 ?

CNV is a region on genome,whose sizes approximately ranges from 1kb to 3Mb.How can I get the gene copy numbers from CNVs?

Thank you for your any valuable advice.Best wishes to you!

valeu · 2024-01-25T09:57:53Z

Hello,

0 means Zero copies of DNA in this region predited.
I guess you need to look at this value before 'gain' and 'loss'. Also visualize the ratio.txt information on the normalized ratio to make sure that the prediction is correct.

ZYongQi · 2024-01-26T07:51:33Z

Hello,

0 means Zero copies of DNA in this region predited.

I guess you need to look at this value before 'gain' and 'loss'. Also visualize the ratio.txt information on the normalized ratio to make sure that the prediction is correct.

Thank you for your reply.I 'll visualize the ratio.txt information on the normalized ratio at once .Now please allow me to simply introduce my "config.txt".And I've been confused about the "CNVs file".
This is part of my config file:

ploidy = 2
breakPointThreshold = 0.8
maxThreads = 16
minExpectedGC = 0.35
maxExpectedGC = 0.55
telocentromeric = 0
coefficientOfVariation = 0.062
degree = 3

I chose the value coefficientOfVariation rather than a fixed bin size.In that case,freec can choose an optimal window size for each sample.Will different windows influence the analysis if I try to combine these CNVs output of different samples? Or will you suggest me to choose a fixed window size like 100bp or else? By the way,the value 0.062 comes from a similar research.
I try to locate the CNVs to the gene like this：

GENE_ID CHROMOSOME GENE_START GENE_STOP CNV_START CNV_STOP CN TYPE
ID=gene-POFUT2 1 1255210 1274664 1264440 1302816 0 loss
ID=gene-DYRK1A 1 7417053 7572985 7516776 7556136 8 gain
ID=gene-TTC3 1 7693717 7749619 7699800 7826736 10 gain
ID=gene-LOC117795648 1 7741425 7741527 7699800 7826736 10 gain
ID=gene-LOC117801378 1 7751983 7752392 7699800 7826736 10 gain
ID=gene-LOC100480655 1 7791807 7792944 7699800 7826736 10 gain
ID=gene-LOC117801382 1 7795812 7796932 7699800 7826736 10 gain
ID=gene-LOC117801055 1 7806381 7811440 7699800 7826736 10 gain
ID=gene-LOC117801383 1 7820518 7824867 7699800 7826736 10 gain

I wonder the connection between CN and GENE_location(start and stop).10 means 10 copies of DNA in the region predicted.Does it mean a CNV repeat 10 times or just 10 different CNVs？If I want to count the numbers of gain and loss,do I need to multiply by 10?

valeu · 2024-01-26T09:48:07Z

coefficientOfVariation = 0.062 will give you some OK window side that will not result in too much noise and false predictions. If this value calculated by FREEC is close to 100, just use window=100 and it will overwrite coefficientOfVariation. Also, you can use a rule of thumb: 400 reads per window will result in low noise and nice predictions.

valeu · 2024-01-26T09:48:47Z

Regarding the annotation of genes - I don't think that there is an official FREEC script to do so. How do you get this file with gene IDs?

ZYongQi · 2024-01-26T13:05:18Z

Regarding the annotation of genes - I don't think that there is an official FREEC script to do so. How do you get this file with gene IDs?

I did make the annotation myself through a perl script.Actually I did the step on the base of the position of predicted CNVs in the output file from FREEC.

To be specific,at first I got the position(start-end) of each gene in the .gff file from NCBI.Second,I looked for genes that overlap with CNV regions by the following standard:**cnv_start<=gene_stop && cnv_stop >=gene_start.**In this way,I will get a gene list whose position(start-end) overlaps with CNVs.Finally I merged the two file. Is this step any problems?

By the way,I 've got the ratio.txt,but I wonder how the ratio value is calculated. Should I filter out ratio values that don't meet a certain threshold? And why the copy number in the ratio.txt appears all 2?

I would appreciate it if your any advice is helpful.Best wishes!

valeu · 2024-02-01T17:24:55Z

The copy number of the ratio.txt for the control sample should be 2 if you use a control. For the donor sample, it can be 2 almost everywhere if it is not a cancer sample. In any case, I suggest visualizing the output (ratio.txt) to make sure you can trust the predictions of FREE (using for example the R script included in the package).
The ratios are normalized read count values. 1 means no change. -1 means Data not available.

ZYongQi · 2024-07-10T02:14:44Z

Hi,thisi is ZY.We did a summary on the quantity and distribution of CNVs and CNV regions . And I took your advice to visualize the ratio.txt file.But still doubted.

R script:FREEC_ratio2Absolute.R. One of the outputs shows:

Chromosome Start End Num_Probes Segment_Mean
NC_048218.1 1 1264440 1285 -0.0513244
NC_048218.1 1264441 1302816 39 -3.715107
NC_048218.1 1302817 3479424 2212 -0.05671026
NC_048218.1 3479425 3504024 25 -4.576851
NC_048218.1 3504025 3536496 33 0.01631089

What kind of criteria should we use to filter the results? The number of probes or a specific segment_mean?
By the way, why some of segment_means equal -Inf?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about the output #140

questions about the output #140

ZYongQi commented Jan 25, 2024

valeu commented Jan 25, 2024

ZYongQi commented Jan 26, 2024

valeu commented Jan 26, 2024

valeu commented Jan 26, 2024 •

edited

Loading

ZYongQi commented Jan 26, 2024

valeu commented Feb 1, 2024

ZYongQi commented Jul 10, 2024

questions about the output #140

questions about the output #140

Comments

ZYongQi commented Jan 25, 2024

valeu commented Jan 25, 2024

ZYongQi commented Jan 26, 2024

valeu commented Jan 26, 2024

valeu commented Jan 26, 2024 • edited Loading

ZYongQi commented Jan 26, 2024

valeu commented Feb 1, 2024

ZYongQi commented Jul 10, 2024

valeu commented Jan 26, 2024 •

edited

Loading