Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mob_Recon fails with compressed input #153

Open
jvfe opened this issue Nov 9, 2023 · 1 comment
Open

Mob_Recon fails with compressed input #153

jvfe opened this issue Nov 9, 2023 · 1 comment

Comments

@jvfe
Copy link

jvfe commented Nov 9, 2023

Hi,

I'm using mob_recon (v3.1.7) on some assemblies and I've noticed that it fails when using a gzip-compressed file and succeeds when using the same file, but decompressed. It looks to be some error related to utf-8 encoding.

Is this expected and is there any way to circumvent this other than decompressing my assemblies? I have over 8000 assemblies so I'm hoping to avoid having to decompress all of them.

Command used

mob_recon --infile SAMD00000756.contigs.fa.gz --num_threads 6 \
--sample_id SAMD00000756 --unicycler_contigs \
--outdir SAMD00000756_mob_recon --debug \
--run_overhang
Error log
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: MOB-recon version 3.1.7  [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:981]
2023-11-09 16:27:35,689 mob_suite.mob_recon DEBUG: Debug log reporting set on successfully [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:982]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: SUCCESS: Found program blastn at /home/jvfe/miniconda3/envs/mobsuite/bin/blastn [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py:592]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: SUCCESS: Found program makeblastdb at /home/jvfe/miniconda3/envs/mobsuite/bin/makeblastdb [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py:592]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: SUCCESS: Found program tblastn at /home/jvfe/miniconda3/envs/mobsuite/bin/tblastn [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py:592]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: Processing fasta file SAMD00000756.contigs.fa.gz [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:1008]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: Analysis directory SAMD00000756_mob_recon [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:1009]
2023-11-09 16:27:40,596 mob_suite.mob_recon INFO: Writing cleaned header input fasta file from SAMD00000756.contigs.fa.gz to SAMD00000756_mob_recon/__tmp/fixed.input.fasta [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:1104]
Traceback (most recent call last):
  File "/home/jvfe/miniconda3/envs/mobsuite/bin/mob_recon", line 10, in <module>
    sys.exit(main())
  File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py", line 1105, in main
    id_mapping = fix_fasta_header(input_fasta, fixed_fasta)
  File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py", line 820, in fix_fasta_header
    for record in SeqIO.parse(handle, "fasta"):
  File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 72, in __next__
    return next(self.records)
  File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/Bio/SeqIO/FastaIO.py", line 238, in iterate
    for title, sequence in SimpleFastaParser(handle):
  File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/Bio/SeqIO/FastaIO.py", line 50, in SimpleFastaParser
    for line in handle:
  File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Running gunzip SAMD00000756.contigs.fa.gz and then re-running the command above works as expected.

I've attached the assembly below.
SAMD00000756.contigs.fa.gz

@kbessonov1984
Copy link
Collaborator

kbessonov1984 commented Nov 15, 2023

Hello,
MOB-Suite tools do not support compressed inputs at the moment. The mob_recon fails to read expected fasta text file as it gets instead a compressed gzip file. I know that gzipped compressed genomes take significantly less space and support of the compressed inputs is a convenience feature, but is low priority for us. Let's just keep this issue open as a reminder for us and as a feature request.

For now please uncompress inputs before running MOB-Suite tools. If space is a limitation, you can temporary decompress inputs, run MOB-Suite tools and then erase decompressed inputs. You can write simple bash or python script or implement it as a NextFlow pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants