-
Notifications
You must be signed in to change notification settings - Fork 112
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add citus runbooks --------- Signed-off-by: Jesse Nelson <[email protected]>
- Loading branch information
Showing
11 changed files
with
1,025 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Change Machine Type for Citus Node Pool(s) | ||
|
||
## Problem | ||
|
||
Need to Change Machine Type for Citus Node Pool(s) | ||
|
||
## Prerequisites | ||
|
||
- Have `jq` installed | ||
- kubectl is pointing to the cluster you want to change the machine type for | ||
- All bash commands assume your working directory is `docs/runbook/scripts` | ||
|
||
## Solution | ||
|
||
1. Follow the steps to [create a disk snapshot for Citus cluster](./create-disk-snapshot-for-citus-cluster.md) | ||
to backup the current cluster data | ||
2. Run | ||
```bash | ||
./change-machine-type.sh | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Create Disk Snapshot for Citus Cluster | ||
|
||
## Problem | ||
|
||
Need to create disk snapshots for Citus cluster(s) | ||
|
||
## Prerequisites | ||
|
||
- Have access to a running Citus cluster deployed by the `hedera-mirror` chart | ||
- Have `jq` installed | ||
- All bash commands assume your working directory is `docs/runbook/scripts` | ||
- The kubectl context is set to the cluster you want to create snapshots from | ||
|
||
## Solution | ||
|
||
Run script and follow along with all prompts | ||
```bash | ||
./volume-snapshot.sh | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
## Problem | ||
|
||
The pvc for a shard is running out of space and needs to be increased beyond current capacity of the disk. | ||
|
||
## Prerequisites | ||
|
||
- Have `jq` installed | ||
- The kubectl context is set to the cluster containing the disks you want to resize | ||
|
||
## Solution | ||
|
||
1. Identify the worker (and/or coordinator) pvc(s) that needs to be resized | ||
```bash | ||
kubectl get pv -o \ | ||
custom-columns='PVC_NAME:.spec.claimRef.name,PV_NAME:.metadata.name,CAPACITY:..spec.capacity.storage,NODE_ID:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0]' \ | ||
--sort-by=.spec.capacity.storage | ||
``` | ||
Example output | ||
```text | ||
PVC_NAME PV_NAME CAPACITY NODE_ID | ||
sentinel-data-mirror-redis-node-1 pvc-9d9da6c6-f6e4-45a3-91cf-61d47e50dcd9 1Gi us-central1-f | ||
sentinel-data-mirror-redis-node-2 pvc-4b25a1b7-c34a-4d1f-8af3-bfcfe908bd99 1Gi us-central1-c | ||
sentinel-data-mirror-redis-node-0 pvc-531e97b6-d4d6-4023-a2dc-847a2fac75dd 1Gi us-central1-b | ||
redis-data-mirror-redis-node-0 pvc-7638c7ba-2ffe-4bb7-a09d-995e4d09b3a4 8Gi us-central1-b | ||
redis-data-mirror-redis-node-1 pvc-edc9ed5a-03b4-48eb-86b0-49def5c1af1f 8Gi us-central1-f | ||
redis-data-mirror-redis-node-2 pvc-638cab0b-ed6c-49b0-a61b-6893a5f3415f 8Gi us-central1-c | ||
prometheus-mirror-prometheus-prometheus-db-prometheus-mirror-prometheus-prometheus-0 pvc-4745d425-fb1d-4af3-85c6-272cff98dcb8 100Gi us-central1-b | ||
storage-mirror-loki-0 pvc-768215f5-30e3-4253-95ea-a82fb733207e 250Gi us-central1-b | ||
mirror-citus-coord-data-mirror-citus-coord-0 pvc-6501aa41-f238-447b-b21b-7d91a36b8f02 256Gi coordinator-us-central1-c-0 | ||
mirror-citus-coord-data-mirror-citus-coord-1 pvc-78ef76d9-ea31-49b3-a9b6-559a3ec5cd9f 256Gi coordinator-us-central1-b-0 | ||
mirror-citus-shard2-data-mirror-citus-shard2-0 pvc-49d46894-51a0-4a97-b2da-e9c003e382f2 3200Gi worker-us-central1-b-0 | ||
mirror-citus-shard0-data-mirror-citus-shard0-0 pvc-5dd58b07-db59-4c3a-882f-dcd7467dfd49 10000Gi worker-us-central1-c-0 | ||
mirror-citus-shard1-data-mirror-citus-shard1-0 pvc-f9b980a9-0771-4222-9034-bd44279ddde8 12000Gi worker-us-central1-f-0 | ||
``` | ||
2. Using the `nodeId` from the previous step, increase the disk size for all disks needed | ||
```text | ||
diskPrefix - value of zfs.init.diskPrefix in values.yaml | ||
diskName - {diskPrefix}-{nodeId}-zfs | ||
zone - extracted from the `nodeId` | ||
diskSize - the new size of the disk in Gb. | ||
``` | ||
```bash | ||
gcloud compute disks resize "{diskName}" --size="{diskSize}" --zone="{zone}" | ||
``` | ||
3. Restart the zfs init pods | ||
```bash | ||
kubectl rollout restart daemonset -n common mirror-zfs-init | ||
``` | ||
4. Verify the pool size has been increased | ||
```bash | ||
kubectl get pods -n common -l component=openebs-zfs-node -o json | | ||
jq -r '.items[].metadata.name' | | ||
xargs -I % kubectl exec -c openebs-zfs-plugin -n common % -- zfs list | ||
``` | ||
5. Update the `hedera-mirror` chart's `values.yaml` to reflect the new disk size | ||
```yaml | ||
stackgres: | ||
coordinator: | ||
persistentVolume: | ||
size: 256Gi | ||
worker: | ||
overrides: | ||
- index: 0 | ||
pods: | ||
persistentVolume: | ||
size: 12000Gi | ||
- index: 1 | ||
pods: | ||
persistentVolume: | ||
size: 14000Gi | ||
- index: 2 | ||
pods: | ||
persistentVolume: | ||
size: 3200Gi | ||
``` | ||
6. Deploy the changes. Be sure to leave wiggle room for zfs rounding | ||
see [here](https://github.com/openebs/zfs-localpv/blob/develop/docs/faq.md#7-why-the-zfs-volume-size-is-different-than-the-reqeusted-size-in-pvc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Restore Citus Data From Disk Snapshots | ||
|
||
## Problem | ||
|
||
Need to restore Citus cluster from disk snapshots | ||
|
||
## Prerequisites | ||
|
||
- Snapshots of disks were created by following the [create snapshot](create-disk-snapshot-for-citus-cluster.md) runbook | ||
- Have `jq` and `ksd`(kubernetes secret decrypter) installed | ||
- The snapshots are from a compatible version of `postgres` | ||
- The `target cluster` has a running Citus cluster deployed with `hedera-mirror` chart | ||
- The `target cluster` you are restoring to doesn't have any pvcs with a size larger than the size of the pvc in the | ||
snapshot. You can't decrease the size of a pvc. If needed, you can delete the existing cluster in the `target cluster` | ||
and redeploy the `hedera-mirror` chart with the default disk sizes. | ||
- If you have multiple Citus clusters in the `target cluster`, you will need to restore all of them | ||
- All bash commands assume your working directory is `docs/runbook/scripts` | ||
- Only a single citus cluster is installed per namespace | ||
- The kubectl context is set to the cluster you want to restore snapshots to | ||
|
||
## Steps | ||
|
||
Run script and follow along with all prompts | ||
```bash | ||
./restore-volume-snapshot.sh | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -euo pipefail | ||
|
||
source ./utils.sh | ||
|
||
GCP_PROJECT="$(readUserInput "Enter GCP Project for target: ")" | ||
if [[ -z "${GCP_PROJECT}" ]]; then | ||
log "GCP_PROJECT is not set and is required. Exiting" | ||
exit 1 | ||
else | ||
gcloud projects describe "${GCP_PROJECT}" > /dev/null | ||
fi | ||
|
||
GCP_K8S_CLUSTER_REGION="$(readUserInput "Enter target cluster region: ")" | ||
if [[ -z "${GCP_K8S_CLUSTER_REGION}" ]]; then | ||
log "GCP_K8S_CLUSTER_REGION is not set and is required. Exiting" | ||
exit 1 | ||
else | ||
gcloud compute regions describe "${GCP_K8S_CLUSTER_REGION}" --project "${GCP_PROJECT}" > /dev/null | ||
fi | ||
|
||
GCP_K8S_CLUSTER_NAME="$(readUserInput "Enter target cluster name: ")" | ||
if [[ -z "${GCP_K8S_CLUSTER_NAME}" ]]; then | ||
log "GCP_K8S_CLUSTER_NAME is not set and is required. Exiting" | ||
exit 1 | ||
else | ||
gcloud container clusters describe --project "${GCP_PROJECT}" \ | ||
--region="${GCP_K8S_CLUSTER_REGION}" \ | ||
"${GCP_K8S_CLUSTER_NAME}" > /dev/null | ||
fi | ||
|
||
MACHINE_TYPE="$(readUserInput "Enter new machine type: ")" | ||
if [[ -z "${MACHINE_TYPE}" ]]; then | ||
log "MACHINE_TYPE is not set and is required. Exiting" | ||
exit 1 | ||
fi | ||
|
||
AVAILABLE_POOLS="$(gcloud container node-pools list --project="${GCP_PROJECT}" --cluster="${GCP_K8S_CLUSTER_NAME}" --region="${GCP_K8S_CLUSTER_REGION}" --format="json(name)"| jq -r '.[].name' | tr '\n' ' ')" | ||
POOLS_TO_UPDATE_INPUT="$(readUserInput "Enter the node pools(${AVAILABLE_POOLS}) to update (space-separated): ")" | ||
if [[ -z "${POOLS_TO_UPDATE_INPUT}" ]]; then | ||
log "POOLS_TO_UPDATE_INPUT is not set and is required. Exiting" | ||
exit 1 | ||
else | ||
IFS=', ' read -r -a POOLS_TO_UPDATE <<< "${POOLS_TO_UPDATE_INPUT}" | ||
for pool in "${POOLS_TO_UPDATE[@]}"; do | ||
POOL_LOCATIONS=($(gcloud container node-pools describe "${pool}" --project="${GCP_PROJECT}" --cluster="${GCP_K8S_CLUSTER_NAME}" --region="${GCP_K8S_CLUSTER_REGION}" --format="json" | jq -r '.locations[]')) | ||
for location in "${POOL_LOCATIONS[@]}"; do | ||
gcloud compute machine-types describe "${MACHINE_TYPE}" --project="${GCP_PROJECT}" --zone="${location}" > /dev/null | ||
done | ||
done | ||
fi | ||
|
||
NAMESPACES=($(kubectl get sgshardedclusters.stackgres.io -A -o jsonpath='{.items[*].metadata.namespace}')) | ||
for namespace in "${NAMESPACES[@]}" | ||
do | ||
unrouteTraffic "${namespace}" | ||
pauseCitus "${namespace}" | ||
done | ||
resizeCitusNodePools 0 | ||
for pool in "${POOLS_TO_UPDATE[@]}" | ||
do | ||
gcloud container node-pools update "${pool}" --project="${GCP_PROJECT}" --cluster="${GCP_K8S_CLUSTER_NAME}" --location="${GCP_K8S_CLUSTER_REGION}" --machine-type="${MACHINE_TYPE}" | ||
done | ||
resizeCitusNodePools 1 | ||
for namespace in "${NAMESPACES[@]}" | ||
do | ||
unpauseCitus "${namespace}" | ||
routeTraffic "${namespace}" | ||
done |
Oops, something went wrong.