Skip to content
Serghei Mangul edited this page Aug 5, 2017 · 4 revisions

Update 08/05/17

  • We decided not to merge contigs
  • Top priority for Mohamed : to finish plots. (1) Sort genomes by the cov and report total number of reads (2) color reads according to fidelity
  • Use moch datasets and subsample to obtain genomes covered by only a few reads

Previous

  1. Prepare the database by mapping bacteria substring (sliding window) on fungi. And also taking the entire refref besides fungi and map onto the fungi to mask fungi genomes.
  2. If the read is mapped entirely to the masked region then ignore it, if it spans the non-masked and masked then keep it if at least 30bp(?) overlap with non-masked
  3. Does it make sense to do this masking inside the database? In between virus, fungi, and plasmids?
  4. Maybe consider LCA instead of just assigning multi-mapped reads (maybe for future release when we do bacteria)
  5. If we do stringent masking we can trust several reads and detect rare organism. This is not available now?
  6. Make interactive graph
  7. Properties of the graph: take only reads which are UNIQ, certain fidelity, etc
  8. Explore all technical parameters
  9. Report separately % genome coverage for UNIW, multi-mapped within, and muti-mapped across
  • Formulate Uniformity of coverage
  • Fidelity of reads
  • UNIX, Multi mapper within
  • PE information
  • Anything else?
Clone this wiki locally