Skip to content

Commit

Permalink
move things around in docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rob-p committed Nov 28, 2024
1 parent 9f938bb commit e73f3fd
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,6 @@ $ oarfish -j 16 --reads sample2_reads.fq.gz --reference transcripts.mmi --seq-te

As with alignment-based mode, these commands will produce several output files, as described [below](index.md#output).

#### Input formats

`oarfish` is capable of taking input in either `FASTA` format `FASTQ` format, or unaligned `BAM` (`uBAM`) format. When you pass the raw reads to `oarfish` via the `--reads` flag, `oarfish` will attempt to infer the type of the input by looking at the file suffix. If it matches one of `.fa`, `.fasta`, `.FA`, `.FASTA`, `.fq`, `.fastq`, `.FQ`, `.FASTQ`, `.fa.gz`, `.fasta.gz`, `.FA.GZ`, `.FASTA.GZ`, `.fq.gz`, `.fastq.gz`, `.FQ.GZ`, or `.FASTQ.GZ`, then the input file will be assumed to be an (appropriately compressed) `FASTA` or `FASTQ` format. Otherwise, if it ends in `.bam` or `.ubam` or `.BAM` or `.UBAM`, it will be assumed to be in `uBAM` format. If the format cannot be inferred via the file suffix (e.g. if the file is being provided via process substitution), then an attempt will be made to parse it as a (possibly compressed) `FASTA`/`FASTQ` format file.

## Input to `oarfish`

Expand All @@ -181,6 +178,10 @@ Given these inputs, `oarfish` will either load the pre-built `minimap2` index, o
the reads to this index using [`minimap2-rs`](https://github.com/jguhlin/minimap2-rs). Optionally, the maximum multimapping rate (i.e. the number of secondary alignments
corresponding to the `minimap2` parameter `-N`) can be specified with the command line parameter `--best-n`. The default value of this parameter is 100.

#### Read-based input formats

`oarfish` is capable of taking input in either `FASTA` format `FASTQ` format, or unaligned `BAM` (`uBAM`) format. When you pass the raw reads to `oarfish` via the `--reads` flag, `oarfish` will attempt to infer the type of the input by looking at the file suffix. If it matches one of `.fa`, `.fasta`, `.FA`, `.FASTA`, `.fq`, `.fastq`, `.FQ`, `.FASTQ`, `.fa.gz`, `.fasta.gz`, `.FA.GZ`, `.FASTA.GZ`, `.fq.gz`, `.fastq.gz`, `.FQ.GZ`, or `.FASTQ.GZ`, then the input file will be assumed to be an (appropriately compressed) `FASTA` or `FASTQ` format. Otherwise, if it ends in `.bam` or `.ubam` or `.BAM` or `.UBAM`, it will be assumed to be in `uBAM` format. If the format cannot be inferred via the file suffix (e.g. if the file is being provided via process substitution), then an attempt will be made to parse it as a (possibly compressed) `FASTA`/`FASTQ` format file.

### Alignmment-based input

In alignment-based mode, `oarfish` processes pre-computed alignments of hte read to the transcriptome. The input should be a `bam` format file, with reads aligned using [`minimap2`](https://github.com/lh3/minimap2) against the _transcriptome_. That is, `oarfish` does not currently handle spliced alignment to the genome. Further, the output alignments should be name sorted (the default order produced by `minimap2` should be fine). _Specifically_, `oarfish` relies on the existence of the `AS` tag in the `bam` records that encodes the alignment score in order to obtain the score for each alignment (which is used in probabilistic read assignment), and the score of the best alignment, overall, for each read. ### Choosing `minimap2` alignment options Since the purpose of `oarfish` is to estimate transcript abundance from a collection of alignments to the target transcriptome, it is important that the alignments are generated in a fashion that is compatible with this goal. Primarily, this means that the aligner should be configured to report as many optimal (and near-optimal) alignments as exist, so that `oarfish` can observe all of this information and determine how to allocate reads to transcripts. We recommend using the following options with `minimap2` when aligning data for later processing by `oarfish` * For ONT data (either dRNA or cDNA): please use the flags `--eqx -N 100 -ax map-ont` For PacBio data: please use the flags `--eqx -N 100 -ax pacbio` **Note (1)**: It may be worthwile using an even larger `N` value (e.g. the [TranSigner manuscript](https://www.biorxiv.org/content/10.1101/2024.04.13.589356v1.full) recommends `-N 181`). A larger value should not diminish the accuracy of `oarfish`, but it may make alignment take longer and produce a larger `bam` file.
Expand Down

0 comments on commit e73f3fd

Please sign in to comment.