Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empty output #12

Open
samuelCollombet opened this issue May 7, 2018 · 7 comments
Open

empty output #12

samuelCollombet opened this issue May 7, 2018 · 7 comments

Comments

@samuelCollombet
Copy link

Hi,
I am trying to run armatus on a sparse matrix, but I seems to get no domains at all, trying with different matrix... I believe there is a problem with my run.
Could i send you my matrix so you can test it, and could I gt an example of a sparse matrix on which you have tested armatus to see if the problem come from my installation?

Thanks,
Samuel

@nsauerwald
Copy link

Hi Samuel,
Please send me your matrix ([email protected]), and I will try to figure out the issue.
In the meantime, any of the sparse Hi-C matrices from the 2012 Dixon publication (available here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE35156), or from the 2014 Rao publication (available here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525) should run correctly with Armatus (make sure to use the -R flag for the Rao dataset, as the file format is different).
Best,
Natalie

@hgu0717
Copy link

hgu0717 commented Jul 11, 2019

Hey,

I've got the same problem. Just to be clear, is the input for Rao's dataset the sparse matrix or domain list? I would think the sparse matrix (3 columns) make sense to me, however, the introduction of Armatus seems to suggest to use domain list with -R?

@nsauerwald
Copy link

The input should be the sparse matrix, as you said. If you use the -R flag, the software will automatically look for both the ".RAWobserved" and the ".KRnorm" files, and normalize the Hi-C data before finding TADs. If you just want to run Armatus on the raw Rao data, use the -N flag to skip the normalization step. For example, if you wanted to find TADs on the normalized 5kb Hi-C matrix, on chromosome 1 of the GM12878 data, use "-R -i GM12878_combined/5kb_resolution_intrachromosomal/chr1/MAPQGE30/chr1_5kb".

@hgu0717
Copy link

hgu0717 commented Jul 19, 2019

Thanks, Natalia.

Here is another basic question for you, if my data is in sparse matrix, does the matrix is supposed to be normalized or not?

Thanks.

@nsauerwald
Copy link

Armatus will run on either normalized or unnormalized data, so the choice of whether to normalize (and which normalization method to use) is up to you based on your application.

@agolicz
Copy link

agolicz commented Aug 7, 2019

Hi,
I seem to be having a similar problem. The sizes of all my output files are zero. I am using an iced matrix produced by HiC-Pro and split by chromosomes using it's utility script split_sparse.py.
My first guess would be that there is some problem with formatting of the input.

Any chance you could help?

I've put some sample files on OSF: https://osf.io/xh4rt/files/

All the best,
Agnieszka

This is the command:
armatus -m -r 5000 -N -S -c ${i} -i $INPUT -g 1.0 -s 0.05 -o armatus.domains/chr${i}

And that's the head of the input file:
head NP1_5000_iced_Chr1.matrix
1 1 88.000000
1 2 84.000000
1 3 55.000000
1 4 28.000000
1 5 30.000000
1 6 22.000000
1 7 19.000000
1 8 7.000000
1 9 12.000000
1 10 4.000000

Some lines from std.out

Reading input from NP1_5000_iced_Chr10.matrix.
Building matrix for chromosome 10 at resolution 5000bp with 1 rows.
Initializing matrix to zero elements
10.9114%
21.8229%
32.7343%
43.6457%
54.5572%
65.4686%
76.38%
87.2915%
98.2029%
MatrixParser read matrix of size: 1 x 1
gamma=0
OPTIMAL SCORE: 0
begin computeTopK()
In topK()
The 0th-best solution had score 0
gamma=0.05
OPTIMAL SCORE: 0
begin computeTopK()
In topK()

@nsauerwald
Copy link

I think the issue is that your data is already divided by the resolution (notice that your TADs start and end at values like 1 and 2 instead of 5000 and 10000), and when you put in that the resolution of your data is 5000, the algorithm tries to divide the TAD boundaries by the resolution, and gets values like 1/5000 for your boundaries. Try using the same command but without the "-r 5000" flag, as shown below.

armatus -m -N -S -c ${i} -i $INPUT -g 1.0 -s 0.05 -o armatus.domains/chr${i}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants