Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #943: don't try to take 2 snapshots at a time #1081

Closed
wants to merge 1 commit into from
Closed

Conversation

dgnorton
Copy link
Contributor

The TakeSnapshot() function in _vendor/raft/server.go was failing at s.stateMachine.Save() and exiting without resetting s.pendingShapshot = nil. That left the raft server in an invalid state.

s.stateMachine.Save() was failing with the error message "gob: encodeReflectValue: nil element", which comes from the Save() function in cluster/cluster_configuration.go. This is a separate issue.

@dgnorton dgnorton self-assigned this Oct 30, 2014
@otoolep
Copy link
Contributor

otoolep commented Oct 30, 2014

Interesting. This PR:

goraft/raft#242

is pretty close to the same thing.

@@ -331,6 +331,8 @@ const (
)

func (s *RaftServer) ForceLogCompaction() error {
s.mutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the lock, influxdb attempts to take multiple snapshots at the same time. If you comment those lines out, rebuild influxdb, and then run it & influxdb_stress (described above) at the same time...you'll see the messages in the log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, I was thinking that might be it from the title of the PR. I guess I might have put the lock inside TakeSnapshot(). This is kind of a matter of taste though -- depends on one thinks about TakeSnapshot().

@dgnorton
Copy link
Contributor Author

@otoolep it was related. There were actually two spots that needed to be reset to nil.

@jvshahid
Copy link
Contributor

Do you know why this gob: encodeReflectValue: nil element is happening ?

@dgnorton
Copy link
Contributor Author

@jvshahid no but I think it's a separate issue.

@jvshahid
Copy link
Contributor

Agree, I was just wondering why it's failing to save the state in the first place. I'll merge this in a bit.

@dgnorton dgnorton closed this in 4b59ddd Oct 31, 2014
@dgnorton dgnorton removed the review label Oct 31, 2014
@dgnorton dgnorton deleted the fix-943 branch December 9, 2014 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants