dorado for m6A analysis #1170

baibhav-bioinfo · 2024-12-10T15:57:55Z

Hello,
Appreciation for developing the valuable and diverse software.

I am trying to use Dorado for m6A analysis (using DRS data), from site identification to Diff Methylated Rate analysis between conditions with replicates (using modkit).

(1) i wanted to know if
dorado basecaller [email protected] /path/to/pod5
--modified-bases-models [email protected]_m6A_DRACH@v1/ --device cude:all --reference ref.fasta > calls.bam

this is correct approach for alignment along with basecalling.

also as other tools (eg. m6Anet) uses transcriptome rather than genome. which one would be more suitable to align with?.

(2) The resultant bed file (after modkit pileup) contains position with Nmod=0. Why are those in the output when my argument is to detect the m6A mod sites? do i need to filter out those, keeping only Nmod>=1?
also the file have Nother_mod, what are those and why are we getting other mods?
is there any option to get only relevant Nmod rows, in my case A?

malton-ont · 2024-12-11T08:59:53Z

Hi @baibhav-bioinfo,

Yes. See the docs here. Regarding which reference to use, bioinformatics questions are better posted on the Nanopore community forums. I'm not familiar with m6Anet.
This question would be better posted on the modkit github page, but at a guess I'd say you are using a model that only calls m6A in a DRACH motif, so perhaps modkit is showing Nmod=0 for A bases not in that context?

baibhav-bioinfo · 2024-12-11T14:52:18Z

Okay @malton-ont thanks for the response.
I will see the docs for more information and post in nanopore community for more answers.
Thanks for your insight Regarding the Nmod=0 rows, it makes sense.

baibhav-bioinfo · 2024-12-13T23:18:18Z

hello,
I was able to run the Dorado basecall + methylation call with my DRS reads. I ran the command both in m6A_DRACH context and m6A_all context, using the following commands

dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A_DRACH@v1/ --device cuda:all > sample.bam

dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A@v1/ --device cuda:all > sample.bam

for the DRACH motif context i got 60,000 sites per sample (10 million reads with ~1000nt per read)
but for all context run, i got >1.6 million m6A sites predicted.

Did something wrong happened in my ALL context run?
Is that number of sites possible?

Please suggest.

malton-ont · 2024-12-16T09:10:19Z

Hi @baibhav-bioinfo,

By default dorado outputs predictions in all-context for any site with a >5% chance of being a modification. You can adjust this using the --modified-bases-threshold parameter if you prefer to be more conservative, or I believe modkit has its own filtering functionality which you can ask about on their github page.

baibhav-bioinfo · 2025-01-06T15:27:46Z

hi
i ran the dorado command with "--modified-bases-threshold" with different threshold values
with 0.90 i got no m6A sites predicted, then i lowered threshold to 0.45, but got no sites too

with 0.25 i got again >1 million sites.

is there anything wrong i am doing?
following is the command i am using

$dorado basecaller [email protected] pod5/ --modified-bases-models
[email protected]_m6A@v1/ --modified-bases-threshold 0.90 --device cuda:all >
c6_r1_sup_m6A_90.bam

malton-ont · 2025-01-06T15:32:48Z

There doesn't look to be anything wrong with your command as far as I can see.

Is 1.6 million sites not reasonable? That's ~30x the number you see for the DRACH context - could it just be that only 1/30 of the A-bases are in a DRACH context?

baibhav-bioinfo · 2025-01-06T15:44:36Z

but all the literature and research articles mentions majority of the m6A sites are in DRACH motif, with very few exceptions.

maybe i will need to see other filters, using modkit. I will read more and let you know if find anything.

Thankyou so much for the prompt responses.

malton-ont · 2025-01-06T15:51:10Z

The all-context model reports modifications at all A-bases, regardless of the context, so I'd expect more sites to be recorded. From your reading it sounds like these should have a relatively low probability outside the DRACH context, which tracks with you seeing few sites when applying a high threshold value. Beyond this, I think you'll get better answers elsewhere, as these are more bioinformatics questions than issues with dorado itself.

baibhav-bioinfo · 2025-01-06T15:54:12Z

You are right, i will write on nanopore community for better explanations.

baibhav-bioinfo · 2025-01-22T14:53:54Z

hello, as you mentioned in the previous chats the default threshold for m6A site detection in "all context" is 5% or 0.05.
can we know what is the default threshold for m6A detection in DRACH context?

malton-ont · 2025-01-22T14:55:58Z

@baibhav-bioinfo,

No thresholds are applied in the case of context-based models. This is because there is no way to distinguish via the MM/ML tags between bases that are skipped because they did not match the context and bases that would be skipped because they did not meet the threshold.

baibhav-bioinfo · 2025-01-22T15:23:41Z

thanks @malton-ont, i will try to understand what you just said.

There is also one question, might be trivial. These are the commands i ran for both DRACH and all context
(1) First ran dorado for basecall the sequences and methylation (without reference)
input: pod5 files
output: bam files
dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A_DRACH@v1/ --device cuda:all > c6_r1_sup_.bam
(2) then i ran the "dorado aligner" for aligning with reference genome with "-x splice" option for DRS reads
dorado aligner sb_genome.mmi c6_r1_sup_.bam --output-dir aligned_output_dorado_sup --mm2-opts "-x splice -k 14 --junc-bed SbicolorRTx430_552_v2.1.gene.bed"
input: bam files
output: aligned bam files
(3) Using modkit pileup to get bedmethyl files
input: aligned bam files
output: bed files
modkit pileup --threads 96 c6_r2_sup_.bam c6_r2_sup_DRACH_pileup.bed --log-filepath c6_r2_sup_DRACH_pileup.log

I inspected the bam files from dorado. the bam files just from step 1 have the MM/ML tags (which tells us about the m6A sites detected), but the aligned bam files from step 2 doesnt have it. That makes me little confused where did the modification information go.

malton-ont · 2025-01-22T15:27:28Z

@baibhav-bioinfo,

dorado aligner strips modification information from non-primary alignments unless soft-clipping is used, as hard-clipping would make the tags and the sequence incompatible. You can turn on soft-clipping by adding -Y to your --mm2-opts.

baibhav-bioinfo · 2025-01-27T15:07:04Z

Thanks @malton-ont , now I can see what is happening there during alignment.
I was searching for any research papers which have used Dorado for m6A analysis for using it as reference guide in my case. If there are any in your knowledge, please let me know.

malton-ont · 2025-02-04T15:12:57Z

@baibhav-bioinfo,

You'll likely get a better response to paper requests on the Nanopore community forums.

malton-ont closed this as completed Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dorado for m6A analysis #1170

dorado for m6A analysis #1170

baibhav-bioinfo commented Dec 10, 2024 •

edited

Loading

malton-ont commented Dec 11, 2024

baibhav-bioinfo commented Dec 11, 2024

baibhav-bioinfo commented Dec 13, 2024

malton-ont commented Dec 16, 2024

baibhav-bioinfo commented Jan 6, 2025

malton-ont commented Jan 6, 2025

baibhav-bioinfo commented Jan 6, 2025

malton-ont commented Jan 6, 2025

baibhav-bioinfo commented Jan 6, 2025

baibhav-bioinfo commented Jan 22, 2025

malton-ont commented Jan 22, 2025 •

edited

Loading

baibhav-bioinfo commented Jan 22, 2025 •

edited

Loading

malton-ont commented Jan 22, 2025 •

edited

Loading

baibhav-bioinfo commented Jan 27, 2025

malton-ont commented Feb 4, 2025

dorado for m6A analysis #1170

dorado for m6A analysis #1170

Comments

baibhav-bioinfo commented Dec 10, 2024 • edited Loading

malton-ont commented Dec 11, 2024

baibhav-bioinfo commented Dec 11, 2024

baibhav-bioinfo commented Dec 13, 2024

malton-ont commented Dec 16, 2024

baibhav-bioinfo commented Jan 6, 2025

malton-ont commented Jan 6, 2025

baibhav-bioinfo commented Jan 6, 2025

malton-ont commented Jan 6, 2025

baibhav-bioinfo commented Jan 6, 2025

baibhav-bioinfo commented Jan 22, 2025

malton-ont commented Jan 22, 2025 • edited Loading

baibhav-bioinfo commented Jan 22, 2025 • edited Loading

malton-ont commented Jan 22, 2025 • edited Loading

baibhav-bioinfo commented Jan 27, 2025

malton-ont commented Feb 4, 2025

baibhav-bioinfo commented Dec 10, 2024 •

edited

Loading

malton-ont commented Jan 22, 2025 •

edited

Loading

baibhav-bioinfo commented Jan 22, 2025 •

edited

Loading

malton-ont commented Jan 22, 2025 •

edited

Loading