-
Notifications
You must be signed in to change notification settings - Fork 0
Ideas
Serghei Mangul edited this page Aug 5, 2017
·
4 revisions
- We decided not to merge contigs
- Top priority for Mohamed : to finish plots. (1) Sort genomes by the cov and report total number of reads (2) color reads according to fidelity
- Use moch datasets and subsample to obtain genomes covered by only a few reads
- Prepare the database by mapping bacteria substring (sliding window) on fungi. And also taking the entire refref besides fungi and map onto the fungi to mask fungi genomes.
- If the read is mapped entirely to the masked region then ignore it, if it spans the non-masked and masked then keep it if at least 30bp(?) overlap with non-masked
- Does it make sense to do this masking inside the database? In between virus, fungi, and plasmids?
- Maybe consider LCA instead of just assigning multi-mapped reads (maybe for future release when we do bacteria)
- If we do stringent masking we can trust several reads and detect rare organism. This is not available now?
- Make interactive graph
- Properties of the graph: take only reads which are UNIQ, certain fidelity, etc
- Explore all technical parameters
- Report separately % genome coverage for UNIW, multi-mapped within, and muti-mapped across
- Formulate Uniformity of coverage
- Fidelity of reads
- UNIX, Multi mapper within
- PE information
- Anything else?