-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xenome classify hangs #9
Comments
Far too many ideas. Need more information to narrow it down. First off, please confirm that all the unit tests succeeded on this platform. Secondly, can you confirm that you tried running it more than once and got the same behaviour? If the bug is intermittent, that narrows down the possibilities. Thirdly, a little bit of information. Could you please show me the output of:
Finally, let's try to create a cut-down test case. Could you please try this?
Then run If that hangs too (should be much quicker), then please send us the cut-down input files. Either attach them to the ticket, or (if you can't let the public see them) email them to me. |
Unit tests were fine when I compiled gossamer. I ran several samples yesterday and all showed the same behaviour. Here's the ls output:
and
Building the index using 8 cores took almost a day but it did finish OK. I ran a test using the 1000 first reads as you suggested and it finished OK - odd. Could this have something to do with how things are parallelised and communication between the threads? Here's the command I used:
|
As this is all public data, if you're interested you can download the fastq.gz files here https://www.ebi.ac.uk/ena/data/view/SRR1176814 It looks like all the output fastq files are correctly produced though so I was able to pull together the stats I needed for my comparison in v2 of https://f1000research.com/articles/5-2741/v1 - results are very similar to our alignment based algorithm. |
Same issue here, also tried to use a small fastq and still did not work. Thanks |
Thanks for that. The information from ls also ruled out the old gzipped-file-is-an-exact-multiple-of-the-io-buffer-size issue that we found in a very old version of the gzip filter. I suspect it's the job manager. We'll take a look. |
Did you find any clue yet? |
Same issue here, xenome classify hangs after the work is finished. |
I just got back from holidays. Picking this up again. |
Deguerre, |
I see the same problem with my data, xenome classify just hangs after all output file are generated. Does anyone have updates on the topic? Thanks |
Anything? I'm also seeing the same problem here. |
Yes, I get the same issue. It did not happen when I tested xenome on a small test sample of reads (~4000), but when I ran it on all my samples (tens of millions of reads per sample) then it hangs after completion (or what looks like completion). |
Just as an update, this is turning out to be a very nasty problem caused by a mismatch between two different threading models. We decided that for the open source release we should use the standard C++ threading system rather than our previous solution which we couldn't easily maintain. The hanging is caused by some of the old code relying on some detail in the previous model that nobody can remember because it was written so long ago. Only the kmer set construction in Xenome seems to be affected. Everything else seems to work. None of the Gossamer authors are being paid to work on this, so we have to work on it around our day jobs. As you all probably have worked out, the problem only happens on large examples, which means each individual test takes a while. I am only speaking on behalf of myself, but I'm sure the other authors agree that we're very sorry about this, and we all want to get this finished as quickly as possible so everyone can use it. Please bear with us. |
Thanks for working toward fixing this! |
I am facing the same issue. Has this bug been fixed now? |
I'm running xenome classify for a week on macOS Sierra (10.12.6). |
I'm also experiencing the same issue and looking forward to the next fix. In the meantime I'm attempting to whittle down the original read files to see how large of a file will still work w/out hanging. I'm down to 0.5% (yes, half of a percent) of the original; this equates to 2 read files around 150 mb uncompressed and still hanging. Just curious what the largest file anyone has been able to run w/out hanging, and on what set up? |
Same issue. I've been using your Xenomes program for a patient xenograft sample. Your algorithm works well, however, I've been running it on bsub, and it seems to be running forever, even though the files have been output and considered finished. For example, if I have a 100,000 read fastq that has been input to xenome, it takes about a day to complete the indexing, and outputting the different .fastq is relatively quick after indexing. I checked the files and the added reads of all five of the files (ambiguous, neither, both, mouse, and human) add up to 100000 almost immediately after the files are created (i would say max 20 minutes), but for some reason the program still runs forever and ever. After two days, the jobs are still "running". To be honest, I don't care much, and willing to write code around it to make sure that the files add up to the original read count. As long as the output is accurate. Can anyone confirm that their output is accurate? For mine it seems that Xenomes reaches the target accuracy, so i'm assuming that once xenome has "finished" the output is accurate and considered done. |
I can confirm too. Classify after running about 20 minutes or so hangs. Logs show,in my case, processing of ~10million reads classify does not advance. Interestingly, input fastq reads match ouput reads. I have to kill each process after 30 min. Program executes its job but cannot finish it. Indexing also took indefinetly long. I played with -M and -T parameters and made it to work in about 8 hrs. Although I could go over 124GB memory on cluster with 16 threads I used 64GB and 12 threads to make it to work. This is arbitrary with no explanation. I wish documentation is more detailed and clear enough. But xenome works in gossamer, i can move on to next step. |
(My apologies for now deleted post, I used -I io -i for classify and fastq data - was not reporting error and just hanging) I could now run a classify job but after processing the 2000000 reads it does not exit. |
I found a work around:
Hope it helps. |
It doesn't exit at all. We've stopped using it since it didn't work for us. |
true but it does the job in our case, what you lack are a happy end of the run and the stats. |
Hello, I'm facing a similar issue. Im running xenome classify on my computer but it hangs/ or takes a really long time to run.
It seems to be stuck. The process is visible in top but shows up as sleeping.
I'd really appreciate your help on this! |
I do not think we will ever get help on this... |
Hello, Thank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!! |
U would use N threads and at least 4g ram per thread? Better 6 to 8 if you have homo+mus data. Also try with 10M reads first to see if that worksGood luck
Sent from my smartphone.
-------- Original message --------From: Rushika Pandya <[email protected]> Date: 7/3/18 21:55 (GMT+01:00) To: data61/gossamer <[email protected]> Cc: Stephane Plaisance <[email protected]>, Comment <[email protected]> Subject: Re: [data61/gossamer] xenome classify hangs (#9)
Hello,
Thank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/data61/gossamer","title":"data61/gossamer","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/data61/gossamer"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rushikapandya in #9: Hello,\r\n\r\nThank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!"}],"action":{"name":"View Issue","url":"#9 (comment)"}}}
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "#9 (comment)",
"url": "#9 (comment)",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
},
{
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"hideOriginalBody": "false",
"originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB",
"title": "Re: [data61/gossamer] xenome classify hangs (#9)",
"sections": [
{
"text": "",
"activityTitle": "**Rushika Pandya**",
"activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png",
"activitySubtitle": "@rushikapandya",
"facts": [
]
}
],
"potentialAction": [
{
"name": "Add a comment",
"@type": "ActionCard",
"inputs": [
{
"isMultiLine": true,
"@type": "TextInput",
"id": "IssueComment",
"isRequired": false
}
],
"actions": [
{
"name": "Comment",
"@type": "HttpPOST",
"target": "https://api.github.com",
"body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"data61/gossamer\",\n\"issueId\": 9,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}"
}
]
},
{
"name": "Close issue",
"@type": "HttpPOST",
"target": "https://api.github.com",
"body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"data61/gossamer\",\n\"issueId\": 9\n}"
},
{
"targets": [
{
"os": "default",
"uri": "#9 (comment)"
}
],
"@type": "OpenUri",
"name": "View on GitHub"
},
{
"name": "Unsubscribe",
"@type": "HttpPOST",
"target": "https://api.github.com",
"body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 191247166\n}"
}
],
"themeColor": "26292E"
}
]
|
Does anybody know of a workaround? I installed Xenome in a Docker container (Ubuntu 16.04) and indexing worked fine but the |
I have an older version of xenome that does not have this issue. Let me
know if you need it, I can share.
…On Wed, Jul 4, 2018 at 12:07 PM romanhaa ***@***.***> wrote:
Does anybody know of a workaround? I installed Xenome in a Docker
container and indexing worked fine but the classify step hangs without
any particular message just like it did for all of you. I tried with FASTQ
files containing either 25k or 2 million reads and both failed. Any
alternative tool to use or other ideas?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ASoqMoCtMcclEoiTXdXqbCDTbc2k0PZdks5uDOg_gaJpZM4LZjM->
.
|
@kannabirannandakumar yes that would be fantastic! |
@kannabirannandakumar I need the older version too, can you share it ?_ |
A colleague provided me a pre-compiled version that works on my system. I have no idea where it comes from and what is different, but you can download it from this link: https://drive.google.com/file/d/1AAmFKT5huWJ6H_8liFsuqFLRSdJBq10b/view?usp=sharing Credit goes to the authors (obviously). |
What is different is that it was compiled with an archaic version of the C++ standard and an older version of Boost. There is an incompatibility with modern C++ that we haven't had a chance to fix yet. All we know so far is that it's a very subtle threading semantics issue and it isn't anything simple. As one of the authors, I have no problem with people sharing the old compiled binary until we fix it given that NICTA is no more, especially if you do it by directing people here to the explanation. That version is, after all, the version about which all published claims are made! Having said that, I don't own it and I don't speak for Data61 (the current owners). |
I've worked around this issue by essentially checking for the program to complete and then killing it. The best way I've found to check for completion is the cessation of writing to the output fastqs. I put it in its own folder and run the following script in the background to check modification times on the files it is writing to. If none of them are written to in 60 seconds it kills xenome and moves on.
|
I experienced something similar to this when using fastq.gz as the input. Then, I decompressed the fastq files and used the resulting .fastq files as the input. That worked. |
I have the same issue - Trying to run xenome on a test paired fastq dataset. The “classify” job is taking over 13 days and still going on. |
I am very sorry to disturb you, but the link you provided is not working and I would like the old version of the xenome you provided |
I need the older version of xenome that does not have this issue, it would be nice if docker was available |
You can extract the binary from the following Docker image: https://hub.docker.com/r/romanhaa/xenocell I don't remember the path off the top of my head but it should be easy to find. |
This issue remains current. The solution provided by @jeffpkamp at #9 (comment) is a workaround. To make it work for multiple samples, I modified it as: # https://hub.docker.com/r/repbioinfo/xenome.2017.01/tags
# singularity pull repbioinfo/xenome.2017.01
SIF=/home/user/data/TestData/xenome.2017.01_latest.sif
THREADS=8
HS=/home/user/data/ExtData/UCSC/hg38/hg38.fa
MM=/home/user/data/ExtData/UCSC/mm39/mm39.fa
# Index genomes, one time
# singularity exec ${SIF} xenome index -T ${THREADS} -P idx -H ${MM} -G ${HS}
DIRIN=/home/user/data/WorkData/RNA-seq/00_raw
DIROUT=/home/user/data/WorkData/RNA-seq/00_raw_xenome
mkdir -p ${DIROUT}
# Single-end
for file in `find ${DIRIN} -type f -name "*.fastq.gz" | grep -v Undetermined`; do
# Process individual samples
SAMPLE=`basename ${file} .fastq.gz`
# Run xenome in the background, note "&"
singularity exec ${SIF} xenome classify -T ${THREADS} -P idx --host-name mouse --graft-name human -i ${file} &
# Monitor the file update times
sleep 1
while [[ 1 -eq 1 ]]
do
n=0
now=`date +%s`
for x in *fastq
do
((n=n+$(echo | awk -v mod=$(date +%s -r $x) -v now=$(date +%s) '{if (now-mod > 120) print 1;else print 0}')))
done
echo "$n files not updated"
if [[ $n -gt 4 ]]
then break
else sleep 10
fi
done
# Terminate xenome and only then run next commands, note ";"
killall -r xenome ;
# Move the default files to sample-specific into a subfolder
mv human.fastq ${DIROUT}/${SAMPLE}_human.fastq
mv mouse.fastq ${DIROUT}/${SAMPLE}_mouse.fastq
mv ambiguous.fastq ${DIROUT}/${SAMPLE}_ambiguous.fastq
mv both.fastq ${DIROUT}/${SAMPLE}_both.fastq
# Wait 2 seconds and only then resume the loop, note ";"
sleep 2 ;
done
gzip ${DIROUT}/*.fastq |
I recommend that everyone uses one of the above workarounds. The Xenome authors don't get paid to work on this, so our time is limited. We have decided that rather than fixing this version, releasing a new version, with updates to suit the realities of modern hardware and modern C++, would be a better use of that limited time. The fix to this bug is to replace one of the offending algorithms, rather than trying to fix this branch. I am looking for beta testers. If you've seriously used Xenome, and would be interested in helping out, we would appreciate it. |
Re: beta testers - I'll be interested. We analyze a lot of PDX data, filter mouse reads by aligning to the combined genome, but would rather use and cite Xenome. |
Excellent! What's the best way to contact you that isn't quite this public? EDIT: Your email at vcu dot edu? |
That's correct. |
Hi @mdozmorov and @Deguerre - just curious - are there any updates on this |
I know that @Deguerre mentioned that xenome authors do not get paid to work on these and their time is limited. It is a very fair and valid comment. With this in mind, one thing to note:
|
Interesting tool, the paper introduces others. I worked with https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh, but it is slow and also not maintained. |
Hi @mdozmorov this is not the official BBmap repo, the best way to get help from Brian Bushnell is through seqanswers |
Thanks, @splaisan, SourceForge and the BBtools website show active development. I hope Xenome development will resume as well. |
FWIW, I tried |
Hi there,
I'm testing
xenome classify
on an aws instance (latest Ubuntu) and it hangs after about 50 minutes. The command I used for launching the process isThis is where it stops:
The sample (SRR1176814) has 47312349 reads so it looks like it's getting to the end but then nothing happens. The process is still visible in
top
:output_stats_SRR1176814.txt
is empty.Any ideas?
The text was updated successfully, but these errors were encountered: