Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The destroy_data_then_repair nemesis takes ~8 hours to delete 51900 sstables. #9595

Open
vponomaryov opened this issue Dec 19, 2024 · 0 comments
Assignees
Labels
Bug Something isn't working right

Comments

@vponomaryov
Copy link
Contributor

vponomaryov commented Dec 19, 2024

Packages

Scylla version: 2024.2.0-20241118.614d56348f46 with build-id e67376d9ddfea081a3bab398f4581ecdde59911d

Kernel Version: 5.15.0-1072-aws

Issue description

In the test where we create and populate 5000 tables was triggered the destroy_data_then_repair nemesis.
In scope of this nemesis scylla service was stopped and 50% (51900) of sstables were deleted.
And the problem with it is that it took about 8 hours:

2024-12-10 04:11:25,484 f:remote_base.py  l:560  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.4.209>: Running command "sudo systemctl stop scylla-server.service"...
...
2024-12-10 05:43:20,464 f:nemesis.py      l:1175 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: SStables amount to destroy (50 percent of all SStables): 51900
...
2024-12-10 05:43:20,969 f:nemesis.py      l:1190 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Files /var/lib/scylla/data/feeds/table1040_field4_table1040_index-18614371b64911efa67f40a70c16fdee/me-3gly_08fl_1zsvk2s1uyyslpee6h-big-Data.db were destroyed
	...
2024-12-10 13:32:08,892 f:nemesis.py      l:1190 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Files /var/lib/scylla/data/feeds/table1291-1d809a90b64911efa67f40a70c16fdee/me-3gly_07c5_5v4b42s1uyyslpee6h-big-Data.db were destroyed
...
2024-12-10 13:32:09,010 f:remote_base.py  l:560  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.4.209>: Running command "sudo systemctl start scylla-server.service"...

Impact

Significant waste of time for actions which could be done much faster.

How frequently does it reproduce?

1/1

Installation details

Cluster size: 1 nodes (i4i.8xlarge)

Scylla Nodes used in this run:

  • longevity-5000-tables-dev-db-node-577988a0-6 (3.249.0.189 | 10.4.4.209) (shards: 30)
  • longevity-5000-tables-dev-db-node-577988a0-5 (54.75.190.27 | 10.4.5.109) (shards: 30)
  • longevity-5000-tables-dev-db-node-577988a0-4 (63.33.66.182 | 10.4.5.238) (shards: 30)
  • longevity-5000-tables-dev-db-node-577988a0-3 (34.243.59.47 | 10.4.7.22) (shards: 30)
  • longevity-5000-tables-dev-db-node-577988a0-2 (34.254.96.143 | 10.4.6.248) (shards: 30)
  • longevity-5000-tables-dev-db-node-577988a0-1 (54.77.124.97 | 10.4.4.86) (shards: 30)

OS / Image: ami-0698e16ac1b56a821 (aws: undefined_region)

Test: vp-scale-5000-tables-test
Test id: 577988a0-bc60-4abe-b176-dd4bea6b8666
Test name: scylla-staging/valerii/vp-scale-5000-tables-test
Test method: longevity_test.LongevityTest.test_user_batch_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 577988a0-bc60-4abe-b176-dd4bea6b8666
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 577988a0-bc60-4abe-b176-dd4bea6b8666

Logs:

Jenkins job URL
Argus

@vponomaryov vponomaryov added the Bug Something isn't working right label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

2 participants