TLI is a testbed to compare (learned) indexes on various datasets and workloads, and it is generally composed of three components (i.e., workload generation, hyper-parameter tuning, performance evaluation). We develop this system from the well-known SOSD framework. Besides, we use perf and pmu-tools to measure micro-architectural metrics.
One dependency that should be emphasized is Intel MKL, used when testing the performance of XIndex and SIndex. The detailed steps of installation can be found here.
Generally, the dependencies can be installed in the following steps.
$ cd /tmp
$ wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ rm GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
$ apt-get update
$ apt-get install -y intel-mkl-2019.0-045
$ apt -y install zstd python3-pip m4 cmake clang libboost-all-dev
$ pip3 install --user numpy scipy
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
After the installation, the following two lines in CMakeLists.txt
may require modification.
set(MKL_LINK_DIRECTORY "/opt/intel/mkl/lib/intel64")
set(MKL_INCLUDE_DIRECTORY "/opt/intel/mkl/include")
We provide a number of scripts to automate things. Each is located in the scripts
directory, but should be executed from the repository root.
./scripts/download.sh
downloads and stores required data from the Internet./scripts/build_rmis.sh
compiles and builds the RMIs for each dataset. If you run into the error messageerror: no override and no default toolchain set
, try runningrustup install stable
../scripts/download_rmis.sh
will download pre-built RMIs instead, which may be faster. You'll need to runbuild_rmis.sh
if you want to measure build times on your platform../scripts/prepare.sh
constructs the single-thread workloads and compiles the testbed, and./scripts/prepare_multithread.sh
for concurrency workloads../scripts/execute.sh, execute_latency.sh, execute_errors.sh, execute_perf.sh
executes the testbed on single-thread workloads, storing the results inresults
, and./scripts/execute_multithread.sh
for concurrency workloads.
Build times can be long, as we make aggressive use of templates to ensure we do not accidentally measure vtable lookup time.
The results in results/through-results
are obtained in single-thread workloads, results/multithread-results
in concurrency workloads, results/string-results
for string indexes. They are shown in the following format.
(index name) (bulk loading time) (index size) (throughput) (hyper-parameters)
The results in results/latency-results
are obtained measuring latencies in single-thread workload, and are shown in the following format.
(index name) (bulk loading time) (index size) (average, P50, P99, P99.9, max, standard derivation of latency) (hyper-parameters)
The results in results/errors-results
are obtained measuring position searches, and are shown in the following format.
(index name) (bulk loading time) (index size) (average, P50, P99, P99.9, max, standard derivation of latency) (average position search overhead) (position search latency per operation) (average prediction error) (hyper-parameters)
The filenames of csvs in results
mainly comply with the following rule.
{dataset}_ops_{operation count}_{range query ratio}_{negative lookup ratio}_{insert ratio}_({insert pattern}_)({hotspot ratio}_)({thread number}_)(mix_)({loaded block number}_)({bulk-loaded data size}_)results_table.csv
The results in results/perf-results
are obtained measuring micro-architectural metrics.