This release contains a number of small bug fixes, and changes to defaults that generally yield slight improvements. However, the reason for the minor (not simply patch) version bump is that this release introduces one major new feature for indexing, which requires bumping the index version (and which means that existing indices will need to be rebuilt for the new version).
This version introduces the option to use an [external memory perfect hash construction algorithm (https://github.com/ot/emphf) to construct the hash function (as opposed to the default Google dense hash) and so requires less memory. This behavior is invoked by passing the --perfectHash
flag to the sailfish index
command. Because the perfect hash function is built in external memory, construction of the hash using this data structure is sower. We don't have longitudinal benchmarks, but it is somewhere between 2 and 5x slower to populate the perfect hash than the dense hash. However, constructing the hash itself requires less memory (less RAM, anyway) and, once constructed, the perfect hash is considerably smaller. Typically, quantification on an index built using a perfect hash will require only ~50% of the memory that is required when using a dense hash.
The performance difference in terms of mapping speed between the two indices is very minimal. Usually, since the perfect hash is smaller, it can be loaded more quickly from disk (this benefit is most noticeable if the index, itself, is build on a very large set of transcripts). The choice of index should have no result on downstream quantification results. The primary motivation for this feature is to allow the construction of indices on large de novo transcriptomes in less RAM. So, the default recommendation (and behavior) is to use the dense hash unless you run into memory problems building the index; in that case, you can use the --perfectHash
flag to try and limit memory usage.