-
-
Notifications
You must be signed in to change notification settings - Fork 19
Benchmark
-
Platform: AWS t3.medium (vCPU x 2, Memory 4 GiB)
-
Storage: EBS volume gp2 / 200 GiB (600 IOPS)
$ cat /etc/issue
Ubuntu 18.04.1 LTS \n \l
$ uname -r -v -m -o
5.3.0-1019-aws #21~18.04.1-Ubuntu SMP Mon May 11 12:33:03 UTC 2020 x86_64 GNU/Linux
$ sed --version
sed (GNU sed) 4.4
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ zsh --version
zsh 5.4.2 (x86_64-ubuntu-linux-gnu)
teip
built by cargo build --release --target x86_64-unknown-linux-musl
.
$ teip --version
teip: 1.2.0
$ ldd $(which teip)
not a dynamic executable
- Dummy
/var/log/secure
- Approx 100 MiB (104857674 bytes)
- 1,078,333 lines
- Includes 761,231 IP addresses
- Extracted before benchmarking
$ wc test_secure
1078333 13068857 104857674 test_secure
$ cat test_secure
May 26 03:19:26 localhost sshd[17872]: Received disconnect from 192.0.2.152 port 29864:11: [preauth]
May 26 03:19:26 localhost sshd[17872]: Disconnected from 192.0.2.78 port 29864 [preauth]
May 26 03:21:10 localhost sshd[17927]: Invalid user amavis1 from 192.0.2.148 port 53364
May 26 03:21:10 localhost sshd[17927]: input_userauth_request: invalid user amavis1 [preauth]
May 26 03:21:10 localhost sshd[17927]: Received disconnect from 192.0.2.189 port 53364:11: Bye Bye [preauth]
...
$ grep -oE '([0-9]{1,3}\.){3}[0-9]{1,3}' test_secure | wc -l
761231
Measuring the time to take all IP addresses in the file to be masked.
- Replace all the IP address in the file with
@@@.@@@.@@@.@@@
, like this.
May 26 03:19:26 localhost sshd[17872]: Received disconnect from @@@.@@@.@@@.@@@ port 29864:11: [preauth]
May 26 03:19:26 localhost sshd[17872]: Disconnected from @@@.@@@.@@@.@@@ port 29864 [preauth]
...
-
Print the result to
/dev/null
during the benchmark -
Clear the page cache before hand.
-
The regular expression
([0-9]{1,3}\.){3}[0-9]{1,3}
is used to match the IP address -
Input is given by the redirection
< test_secure
on Zsh -
time
andpv
commands are used to measure the actual processing time -
Try three times and calculate the average
-
Here are the cases for benchmarking
-
(1) awk(gsub)
$ awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure
- (2) sed(s//)
$ sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure
- (3) teip + awk(gsub)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure
- (4) teip + sed(s//)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure
But they may be unfair benchmarks for teip
.
Because the last two cases use the same regular expression twice.
Try the following two cases which uses the regular expression only once per execution.
The target commands are just printing @@@.@@@.@@@.@@@
.
- (5) teip + sed(i text)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure
- (6) teip + awk(only print)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure
- Check that all the results are same before the benchmark
$ awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure > by_awk
$ sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure > by_sed
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure > by_teip_awk
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure > by_teip_sed
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure > by_teip_awk2
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure > by_teip_sed2
$ md5sum by_*
f6a06ada3478e650a01731325f262508 by_awk
f6a06ada3478e650a01731325f262508 by_sed
f6a06ada3478e650a01731325f262508 by_teip_awk
f6a06ada3478e650a01731325f262508 by_teip_awk2
f6a06ada3478e650a01731325f262508 by_teip_sed
f6a06ada3478e650a01731325f262508 by_teip_sed2
case | 1st(sec) | 2nd(sec) | 3rd(sec) | mean(sec) | MiB/sec |
---|---|---|---|---|---|
awk(gsub) | 8.753 | 8.204 | 8.212 | 8.390 | 11.919 |
sed(s//) | 5.430 | 5.436 | 5.312 | 5.393 | 18.544 |
teip + awk(gsub) | 4.248 | 4.383 | 4.288 | 4.306 | 23.222 |
teip + sed(s//) | 3.871 | 3.886 | 3.628 | 3.795 | 26.350 |
teip + awk(only print) | 2.099 | 2.303 | 1.916 | 2.106 | 47.483 |
teip + sed(i text) | 1.798 | 1.831 | 1.878 | 1.836 | 54.476 |
- The mean value rounded to the third decimal place.
- MiB/s ... 104857674 / 2^20 / mean
Here are details.
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' test_secure | pv >/dev/null
103MiB 0:00:08 [11.8MiB/s] [ <=> ]
awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' 8.41s user 0.17s system 98% cpu 8.753 total
pv > /dev/null 0.08s user 0.32s system 4% cpu 8.752 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' test_secure | pv >/dev/null
103MiB 0:00:08 [12.6MiB/s] [ <=> ]
awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' 7.96s user 0.19s system 99% cpu 8.204 total
pv > /dev/null 0.05s user 0.30s system 4% cpu 8.203 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' test_secure | pv >/dev/null
103MiB 0:00:08 [12.6MiB/s] [ <=> ]
awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' 7.94s user 0.17s system 98% cpu 8.212 total
pv > /dev/null 0.03s user 0.19s system 2% cpu 8.210 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
103MiB 0:00:05 [19.0MiB/s] [ <=> ]
sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure 5.21s user 0.19s system 99% cpu 5.430 total
pv > /dev/null 0.06s user 0.35s system 7% cpu 5.428 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
103MiB 0:00:05 [19.0MiB/s] [ <=> ]
sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure 5.26s user 0.16s system 99% cpu 5.436 total
pv > /dev/null 0.08s user 0.35s system 7% cpu 5.436 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
103MiB 0:00:05 [19.5MiB/s] [ <=> ]
sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure 5.11s user 0.20s system 99% cpu 5.312 total
pv > /dev/null 0.12s user 0.23s system 6% cpu 5.312 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure | pv > /dev/null
103MiB 0:00:04 [24.4MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk < test_secure 3.11s user 0.21s system 78% cpu 4.248 total
pv > /dev/null 0.02s user 0.10s system 2% cpu 4.247 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure | pv > /dev/null
103MiB 0:00:04 [23.6MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk < test_secure 3.25s user 0.20s system 78% cpu 4.383 total
pv > /dev/null 0.01s user 0.08s system 1% cpu 4.382 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure | pv > /dev/null
103MiB 0:00:04 [24.1MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk < test_secure 3.08s user 0.23s system 77% cpu 4.288 total
pv > /dev/null 0.02s user 0.09s system 2% cpu 4.288 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
103MiB 0:00:03 [27.1MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r < test_secure 3.26s user 0.22s system 89% cpu 3.871 total
pv > /dev/null 0.02s user 0.09s system 2% cpu 3.869 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
103MiB 0:00:03 [26.6MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r < test_secure 3.45s user 0.16s system 92% cpu 3.886 total
pv > /dev/null 0.03s user 0.10s system 3% cpu 3.886 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
103MiB 0:00:03 [28.5MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r < test_secure 3.31s user 0.17s system 96% cpu 3.628 total
pv > /dev/null 0.02s user 0.10s system 3% cpu 3.628 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure | pv > /dev/null
103MiB 0:00:02 [49.4MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < 2.64s user 0.23s system 136% cpu 2.099 total
pv > /dev/null 0.03s user 0.07s system 4% cpu 2.099 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure | pv > /dev/null
103MiB 0:00:02 [44.9MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < 3.09s user 0.18s system 141% cpu 2.303 total
pv > /dev/null 0.04s user 0.07s system 4% cpu 2.303 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure | pv > /dev/null
103MiB 0:00:01 [54.1MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < 2.59s user 0.26s system 148% cpu 1.916 total
pv > /dev/null 0.02s user 0.10s system 5% cpu 1.916 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure | pv > /dev/null
103MiB 0:00:01 [57.6MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < 2.46s user 0.21s system 148% cpu 1.798 total
pv > /dev/null 0.02s user 0.09s system 6% cpu 1.797 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure | pv > /dev/null
103MiB 0:00:01 [56.7MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < 2.52s user 0.19s system 147% cpu 1.831 total
pv > /dev/null 0.03s user 0.09s system 6% cpu 1.830 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure | pv > /dev/null
103MiB 0:00:01 [55.2MiB/s] [ <=> ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < 2.46s user 0.27s system 145% cpu 1.878 total
pv > /dev/null 0.04s user 0.08s system 6% cpu 1.878 total