You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Properly handling disasters outside of just simple Kubernetes restarted the pod and everything worked
Handling Pulsar disk fill
Even though Pulsar has a disk quota check we've had situations where our ingestion rate was so quick we filled the disk before the check happened to protect Pulsar. The cluster itself became unresponsive and we had to destroy it.
Handling removing a bookie gracefully or by force from the cluster
We had a situation trying to move Pulsar from one AWS AZ to another AWS AZ and having issues with EBS volumes. We tried to delete the PV/PVC claim from kubernetes thinking when we spun up the pod again it'd just issue a new pv/pvc and the autorecovery process would take over.
We were greeted with an error message about it not being a new bookie and the folder shouldn't be empty. I think we were supposed to decommission the bookie cleanly but we didn't know of this.
After we deleted the volume, I jumped onto another bookie and tried to use the ./bookkeeper shell to list out the bookie id of the deleted bookie and got a null pointer. We had to nuke the cluster.
I still to this date don't know what is the approach of removing a bookie and what to do if a bookie has to be removed not cleanly.
Checking on Auto Recovery status
Aside from checking the logs in the autorecovery pod I don't know how to properly see when replication is under the correct amount, what the process is, seeing which partitions are affected.
Why do you want to learn this topic? (required)
It is important understanding these when having a cluster to have faith in ever using it again. When bad things happen you need to be able to understand & know how to gracefully get yourself out of the situation.
Describe a topic you want to learn (required)
Properly handling disasters outside of just simple Kubernetes restarted the pod and everything worked
Handling Pulsar disk fill
Even though Pulsar has a disk quota check we've had situations where our ingestion rate was so quick we filled the disk before the check happened to protect Pulsar. The cluster itself became unresponsive and we had to destroy it.
Handling removing a bookie gracefully or by force from the cluster
We had a situation trying to move Pulsar from one AWS AZ to another AWS AZ and having issues with EBS volumes. We tried to delete the PV/PVC claim from kubernetes thinking when we spun up the pod again it'd just issue a new pv/pvc and the autorecovery process would take over.
We were greeted with an error message about it not being a new bookie and the folder shouldn't be empty. I think we were supposed to decommission the bookie cleanly but we didn't know of this.
After we deleted the volume, I jumped onto another bookie and tried to use the ./bookkeeper shell to list out the bookie id of the deleted bookie and got a null pointer. We had to nuke the cluster.
I still to this date don't know what is the approach of removing a bookie and what to do if a bookie has to be removed not cleanly.
Checking on Auto Recovery status
Aside from checking the logs in the autorecovery pod I don't know how to properly see when replication is under the correct amount, what the process is, seeing which partitions are affected.
Why do you want to learn this topic? (required)
It is important understanding these when having a cluster to have faith in ever using it again. When bad things happen you need to be able to understand & know how to gracefully get yourself out of the situation.
Reference (optional)
https://pulsar.apache.org/docs/en/administration-zk-bk/#decommissioning-bookies-cleanly
The text was updated successfully, but these errors were encountered: