You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With our Bounded Staleness consistency settings, where it normally reads data from two secondary replicas and chooses the most recent version. It then verifies consistency by sending "head requests" to all secondaries (and the primary if the replica set is less than 4).
Recently, during a deployment, one secondary became unavailable due to the update, and another crashed. This left us with a replica set of only three (two secondaries and the primary).
The system attempted to read from the two remaining secondaries, but one was unreachable due to the crash. This triggered a validation check, which normally would involve reading from the primary if no data was retrieved from the secondaries. However, in this case, the validation logic prevented reading from the primary because the replica set size was 3 (it expected at least 2 responses for a quorum).
This validation failure caused an exception and retries, but it didn't resolve the issue.
Proposed Solution:
To avoid this issue, we propose modifying the system's behavior when the replica set size is reduced and one secondary is unavailable. Instead of requiring a quorum from the remaining secondaries, we would include the primary in the selection process. This would allow the system to read from all available replicas and establish consistency. This change would ensure the system remains operational even during similar failures.
The text was updated successfully, but these errors were encountered:
Problem:
With our Bounded Staleness consistency settings, where it normally reads data from two secondary replicas and chooses the most recent version. It then verifies consistency by sending "head requests" to all secondaries (and the primary if the replica set is less than 4).
Recently, during a deployment, one secondary became unavailable due to the update, and another crashed. This left us with a replica set of only three (two secondaries and the primary).
The system attempted to read from the two remaining secondaries, but one was unreachable due to the crash. This triggered a validation check, which normally would involve reading from the primary if no data was retrieved from the secondaries. However, in this case, the validation logic prevented reading from the primary because the replica set size was 3 (it expected at least 2 responses for a quorum).
This validation failure caused an exception and retries, but it didn't resolve the issue.
Proposed Solution:
To avoid this issue, we propose modifying the system's behavior when the replica set size is reduced and one secondary is unavailable. Instead of requiring a quorum from the remaining secondaries, we would include the primary in the selection process. This would allow the system to read from all available replicas and establish consistency. This change would ensure the system remains operational even during similar failures.
The text was updated successfully, but these errors were encountered: