Skip to content

Running Fusera

Saul A Kravitz edited this page Jun 20, 2018 · 14 revisions

Access the help with fusera help:

A FUSE interface to the NCBI Sequence Read Archive (SRA)

Usage:
  fusera [command]

Available Commands:
  help        Help about any command
  mount       Mount a running instance of Fusera to a folder.
  unmount     Unmount a running instance of Fusera.
  version     Print the version number of Fusera

Flags:
  -d, --debug   Enable debug output.
  -h, --help    help for fusera

Use "fusera [command] --help" for more information about a command.

The 'mount' command builds a filesystem presenting the files associated with a collection of SRA accession numbers.

$ fusera help mount
Mount a running instance of Fusera to a folder.

Usage:
  fusera mount [flags] /path/to/mountpoint

Flags:
  -a, --accession string   A list of accessions to mount or path to cart file. ["SRR123,SRR456" | local/cart/file | https://<bucket>.<region>.s3.amazonaws.com/<cart/file>].
                           Environment Variable: [$DBGAP_ACCESSION]
      --aws-batch int      ADVANCED: Adjust the amount of accessions put in one request to the SDL API when using an AWS location.
                           Environment Variable: [$DBGAP_AWS-BATCH] (default 50)
      --eager              ADVANCED: Have fusera request that urls be signed by the API on start up.
                           Environment Variable: [$DBGAP_EAGER]
  -e, --endpoint string    ADVANCED: Change the endpoint used to communicate with SDL API.
                           Environment Variable: [$DBGAP_ENDPOINT] (default "https://www.ncbi.nlm.nih.gov/Traces/sdl/1/retrieve")
  -f, --filetype string    comma separated list of the only file types to copy.
                           Environment Varible: [$DBGAP_FILETYPE]
      --gcp-batch int      ADVANCED: Adjust the amount of accessions put in one request to the SDL API when using a GCP location.
                           Environment Variable: [$DBGAP_GCP-BATCH] (default 25)
  -h, --help               help for mount
  -l, --location string    Cloud provider and region where files should be located: [cloud.region].
                           Environment Variable: [$DBGAP_LOCATION]
  -n, --ngc string         A path to an ngc file used to authorize access to accessions in DBGaP: [local/ngc/file | https://<bucket>.<region>.s3.amazonaws.com/<ngc/file>].
                           Environment Variable: [$DBGAP_NGC]

Global Flags:
  -d, --debug   Enable debug output.

A simple run of Fusera:

$ fusera mount --ngc ~/file.ngc --accession "SRR123,SRR456" --location s3.us-east-1 ~/studies

NOTE: Fusera needs to continue running in order to operate. So this command will not "end" and bring a terminal prompt back up until fusera is quit (CTRL-C) or unmounted from another terminal command in another shell (like with fusera unmount ~/studies). There is a way, described below in the tips and tricks, to run fusera in the background which allows one to keep using the terminal session used to invoke fusera.

Tips and Tricks

Shortening the call length

All of these flags have equivalent environment variables ($DBGAP_NGC, $DBGAP_ACCESSION, $DBGAP_LOCATION, etc), which might be more handy when attempting to automate the use of fusera over multiple machines or reduce redundancy if you find yourself consistently invoking fusera with the same flags. Using all the environment variables, a call to fusera could look like so:

$ fusera mount ~/studies

Another way to ease the use of fusera is through using it on a compute instance on either AWS or GCP. When fusera is not given a location through the flag or environment variable, it will attempt to utilize known ways of resolving where fusera is running with respect to that cloud platform and will use the location it finds.

Running fusera in the background

If the you want to run fusera in the background you can do so with shell commands. Example:

$ fusera mount ~/tmp > output.log  2>&1 &
[1] 12464
$ disown %1

Breakdown:
> output.log
This redirects stdout to a file named output.log. If you don't want the output, use > /dev/null instead.
2>&1
The way to redirect stderr to print with stdout so it is caught in output.log (or /dev/null) as well.
&
Run this process in the background so I can continue using the shell.
[1] 12464
This is an example of the printout that will appear after entering the whole command. The numbers outside the brackets will most likely be different than this example, but it doesn't matter. What this information means is that this is the first ([1]) command started in the background from this terminal and its process id is 12464. Again, this doesn't matter except now one knows what to pass to the disown command described below.
disown %1
This will keep fusera running even if the terminal is closed. This example passes %1 because a 1 was in the brackets of the output after executing the fusera command above. If a different number is displayed for one while attempting this, that number should be used instead.

Using fusera's unmount command on the folder fusera is mounted to will kill the process, as long as nothing is using the file system at that time.

Advice

The <mountpoint> must be an existing, empty directory, to which the user has read and write permissions.

It is recommended that the mountpoint be a directory owned by the user. Creating the mountpoint in system directories such as /mnt, /dev, and /tmp have special uses in unix systems and should be avoided.

Because of the nature of FUSE systems, only the user who ran fusera will be able to read the files mounted. This can be changed by editing a config file (reference) on the machine to allow_others, but be warned that there are security implications to be considered: https://github.com/libfuse/libfuse#security-implications.

Clone this wiki locally