You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using mob_recon (v3.1.7) on some assemblies and I've noticed that it fails when using a gzip-compressed file and succeeds when using the same file, but decompressed. It looks to be some error related to utf-8 encoding.
Is this expected and is there any way to circumvent this other than decompressing my assemblies? I have over 8000 assemblies so I'm hoping to avoid having to decompress all of them.
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: MOB-recon version 3.1.7 [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:981]
2023-11-09 16:27:35,689 mob_suite.mob_recon DEBUG: Debug log reporting set on successfully [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:982]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: SUCCESS: Found program blastn at /home/jvfe/miniconda3/envs/mobsuite/bin/blastn [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py:592]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: SUCCESS: Found program makeblastdb at /home/jvfe/miniconda3/envs/mobsuite/bin/makeblastdb [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py:592]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: SUCCESS: Found program tblastn at /home/jvfe/miniconda3/envs/mobsuite/bin/tblastn [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py:592]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: Processing fasta file SAMD00000756.contigs.fa.gz [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:1008]
2023-11-09 16:27:35,689 mob_suite.mob_recon INFO: Analysis directory SAMD00000756_mob_recon [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:1009]
2023-11-09 16:27:40,596 mob_suite.mob_recon INFO: Writing cleaned header input fasta file from SAMD00000756.contigs.fa.gz to SAMD00000756_mob_recon/__tmp/fixed.input.fasta [in /home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py:1104]
Traceback (most recent call last):
File "/home/jvfe/miniconda3/envs/mobsuite/bin/mob_recon", line 10, in <module>
sys.exit(main())
File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/mob_recon.py", line 1105, in main
id_mapping = fix_fasta_header(input_fasta, fixed_fasta)
File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/mob_suite/utils.py", line 820, in fix_fasta_header
for record in SeqIO.parse(handle, "fasta"):
File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 72, in __next__
return next(self.records)
File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/Bio/SeqIO/FastaIO.py", line 238, in iterate
for title, sequence in SimpleFastaParser(handle):
File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/site-packages/Bio/SeqIO/FastaIO.py", line 50, in SimpleFastaParser
for line in handle:
File "/home/jvfe/miniconda3/envs/mobsuite/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Running gunzip SAMD00000756.contigs.fa.gz and then re-running the command above works as expected.
Hello,
MOB-Suite tools do not support compressed inputs at the moment. The mob_recon fails to read expected fasta text file as it gets instead a compressed gzip file. I know that gzipped compressed genomes take significantly less space and support of the compressed inputs is a convenience feature, but is low priority for us. Let's just keep this issue open as a reminder for us and as a feature request.
For now please uncompress inputs before running MOB-Suite tools. If space is a limitation, you can temporary decompress inputs, run MOB-Suite tools and then erase decompressed inputs. You can write simple bash or python script or implement it as a NextFlow pipeline.
Hi,
I'm using mob_recon (v3.1.7) on some assemblies and I've noticed that it fails when using a gzip-compressed file and succeeds when using the same file, but decompressed. It looks to be some error related to utf-8 encoding.
Is this expected and is there any way to circumvent this other than decompressing my assemblies? I have over 8000 assemblies so I'm hoping to avoid having to decompress all of them.
Command used
Error log
Running
gunzip SAMD00000756.contigs.fa.gz
and then re-running the command above works as expected.I've attached the assembly below.
SAMD00000756.contigs.fa.gz
The text was updated successfully, but these errors were encountered: