-
Notifications
You must be signed in to change notification settings - Fork 12
Running Fusera
Access the help with fusera help
:
A FUSE interface to the NCBI Sequence Read Archive (SRA)
Usage:
fusera [command]
Available Commands:
help Help about any command
mount Mount a running instance of Fusera to a folder.
unmount Unmount a running instance of Fusera.
version Print the version number of Fusera
Flags:
-d, --debug Enable debug output.
-h, --help help for fusera
Use "fusera [command] --help" for more information about a command.
The 'mount' command builds a filesystem presenting the files associated with a collection of SRA accession numbers.
$ fusera help mount
Mount a running instance of Fusera to a folder.
Usage:
fusera mount [flags] /path/to/mountpoint
Flags:
-a, --accession string A list of accessions to mount or path to cart file. ["SRR123,SRR456" | local/cart/file | https://<bucket>.<region>.s3.amazonaws.com/<cart/file>].
Environment Variable: [$DBGAP_ACCESSION]
--aws-batch int ADVANCED: Adjust the amount of accessions put in one request to the SDL API when using an AWS location.
Environment Variable: [$DBGAP_AWS-BATCH] (default 50)
--eager ADVANCED: Have fusera request that urls be signed by the API on start up.
Environment Variable: [$DBGAP_EAGER]
-e, --endpoint string ADVANCED: Change the endpoint used to communicate with SDL API.
Environment Variable: [$DBGAP_ENDPOINT] (default "https://www.ncbi.nlm.nih.gov/Traces/sdl/1/retrieve")
-f, --filetype string comma separated list of the only file types to copy.
Environment Varible: [$DBGAP_FILETYPE]
--gcp-batch int ADVANCED: Adjust the amount of accessions put in one request to the SDL API when using a GCP location.
Environment Variable: [$DBGAP_GCP-BATCH] (default 25)
-h, --help help for mount
-l, --location string Cloud provider and region where files should be located: [cloud.region].
Environment Variable: [$DBGAP_LOCATION]
-n, --ngc string A path to an ngc file used to authorize access to accessions in DBGaP: [local/ngc/file | https://<bucket>.<region>.s3.amazonaws.com/<ngc/file>].
Environment Variable: [$DBGAP_NGC]
Global Flags:
-d, --debug Enable debug output.
A simple run of Fusera:
$ fusera mount --ngc ~/file.ngc --accession "SRR123,SRR456" --location s3.us-east-1 ~/studies
NOTE: Fusera needs to continue running in order to operate. So this command will not "end" and bring a terminal prompt back up until fusera is quit (CTRL-C
) or unmounted from another terminal command in another shell (like with fusera unmount ~/studies
). There is a way, described below in the tips and tricks, to run fusera in the background which allows one to keep using the terminal session used to invoke fusera.
All of these flags have equivalent environment variables ($DBGAP_NGC, $DBGAP_ACCESSION, $DBGAP_LOCATION, etc), which might be more handy when attempting to automate the use of fusera over multiple machines or reduce redundancy if you find yourself consistently invoking fusera with the same flags. Using all the environment variables, a call to fusera could look like so:
$ fusera mount ~/studies
Another way to ease the use of fusera is through using it on a compute instance on either AWS or GCP. When fusera is not given a location through the flag or environment variable, it will attempt to utilize known ways of resolving where fusera is running with respect to that cloud platform and will use the location it finds.
If the you want to run fusera in the background you can do so with shell commands. Example:
$ fusera mount ~/tmp > output.log 2>&1 &
[1] 12464
$ disown %1
Breakdown:
> output.log
This redirects stdout to a file named output.log. If you don't want the output, use > /dev/null
instead.
2>&1
The way to redirect stderr to print with stdout so it is caught in output.log (or /dev/null) as well.
&
Run this process in the background so I can continue using the shell.
[1] 12464
This is an example of the printout that will appear after entering the whole command. The numbers outside the brackets will most likely be different than this example, but it doesn't matter. What this information means is that this is the first ([1]) command started in the background from this terminal and its process id is 12464. Again, this doesn't matter except now one knows what to pass to the disown
command described below.
disown %1
This will keep fusera running even if the terminal is closed. This example passes %1
because a 1
was in the brackets of the output after executing the fusera command above. If a different number is displayed for one while attempting this, that number should be used instead.
Using fusera's unmount command on the folder fusera is mounted to will kill the process, as long as nothing is using the file system at that time.
The <mountpoint>
must be an existing, empty directory, to which the user has read and write permissions.
It is recommended that the mountpoint be a directory owned by the user. Creating the mountpoint in system directories such as /mnt
, /dev
, and /tmp
have special uses in unix systems and should be avoided.
Because of the nature of FUSE systems, only the user who ran fusera will be able to read the files mounted. This can be changed by editing a config file (reference) on the machine to allow_others
, but be warned that there are security implications to be considered: https://github.com/libfuse/libfuse#security-implications.