FSAudit: analyze and visualize filesystem usage

Notes here for multi-volume version of FSAudit

Past multi-volume work has been on MGI here: /gscuser/mwyczalk/projects/FSAudit/FSAudit.dev/FSAudit/multi-run

Overview

Processing proceeds in these steps:

Evaluate volume. Traverse entire volume (essentially find | stat) obtain information about all files in a specified filesystem. Writes rawstat file.
- May be run sudo to provides complete information for all files regardless of permissions
Process stats. Secondary analysis of above data, writes filestat file
Summarize stat. Merge above data according to owner and extension, writes summary file
Plot stats. Generate visualization figures

Installation

This package requires python 3, GNU parallel, and R packages plyr and ggplot2.

This can be managed with Conda

VolumeList

The VolumeList file (default config/VolumeList.dat) contains the following two fields for every volume to be audited, tab-separated:

VOLUME_NAME: Short name of system and volume, used for filenames
VOLUME: This is the base path we are analyzing

Example VolumeList.dat file:

MGI.gc2500  /gscmnt/gc2500/dinglab
MGI.gc2508  /gscmnt/gc2508/dinglab
MGI.gc2509  /gscmnt/gc2509/dinglab

Output files

*.rawstat.gz
# file_name     file_type       file_size       owner_name      time_mod        hard_links

*.filestat.gz
# dirname       filename        ext     file_type       file_size       owner_name      time_mod        hard_links

*.summary.dat
ext owner_name  count   cumulative_size

Run notes

Create config/VolumeList.dat

tmux new -s FSAudit - Optional call to start tmux. This is useful because run is time consuming
If on MGI, 0_start_MGI_docker.sh

To debug and test processing, run in dryrun mode and only the first one. bash 1_start_runs.sh -d1 will show the call to process_FS.sh, and bash 1_start_runs.sh -dd1 shows processing of individual steps

To run all with four jobs at a time,

bash 1_start_runs.sh -J 4

MGI-specific

On MGI, use conda environment p3R. Conda cheat sheet

conda activate p3R

Output

Confirm this...

The following plots are generated

All output is written to ./dat, ./logs, ./img

Handy analysis

Get details for given extension and user:

zcat /gscmnt/gc3020/dinglab/mwyczalk/gc2737.20190612.filestat.gz | awk -v FS="\t" '{if ($3 == ".chr20" && $6 == "rmashl") print}'

Background

Relevant `stat` options

From man stat

   --printf=FORMAT
          like --format, but interpret backslash escapes, and do not output a mandatory trailing newline; if you want a newline, include \n in FORMAT

What I want in order

       %n     file name
       %F     file type
       %s     total size, in bytes
       %U     user name of owner
       %y     time of last modification, human-readable    
       %h     number of hard links

Installation

This package requires python 3. R packages which need to be installed: plyr, ggplot2 Also require GNU parallel

Debug

This requires python 3. Python 2 yields errors like this:

TypeError: open() got an unexpected keyword argument 'encoding'

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
adhoc_analysis		adhoc_analysis
config		config
doc		doc
docker		docker
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
0_start_MGI_docker.sh		0_start_MGI_docker.sh
1_start_runs.sh		1_start_runs.sh
README.md		README.md
README.project.md		README.project.md
TODO		TODO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSAudit: analyze and visualize filesystem usage

Overview

Installation

VolumeList

Output files

Run notes

MGI-specific

Output

Handy analysis

Background

Relevant `stat` options

Installation

Debug

About

Releases

Packages

Languages

ding-lab/FSAudit

Folders and files

Latest commit

History

Repository files navigation

FSAudit: analyze and visualize filesystem usage

Overview

Installation

VolumeList

Output files

Run notes

MGI-specific

Output

Handy analysis

Background

Relevant stat options

Installation

Debug

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Relevant `stat` options

Packages