Improved performance for MinHash and MinHashLSH
- Performance improvement for MinHash's update method.
- Make MinHash updates 4.5X faster by using
update_batch
method for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch) - Further performance gain by using bulk generation of MinHash using
MinHash.bulk
orMinHash.generator
. See API doc and pull request. - Optional compression for MinHash LSH index by hashing the bucket key produced by
MinHashLSH._H
. See pull request. This leads to saving of memory/storage space used by the index.
Thank you @Sinusoidal36!