-
Notifications
You must be signed in to change notification settings - Fork 281
Validator Guides
Doublesign is creation of at least one pair of fork events in DAG. Check out fork rules.
Creator of fork events is called a cheater.
If fork is created, then it'll do no harm to other users unless 1/3W of validators cheat simultaneously.
Punishment of a detected cheater is fully controlled by the SFC contract. It's worth to mention that the SFC contract can be upgraded by the governance contract at any time without hardfork.
SFC slashes 100% of the stake of cheater validator and his delegators, unless overridden by the governance contract.
Pay attention that exact same security rules are applied to delegators without exceptions - it's necessary because delegators increase validator's weight in consensus algorithm. If delegators wouldn't have the same possible punishment, then validator would delegate to himself to minimize punishment.
Fork is created when validator doesn't use his last event as self-parent of every created event.
It means that validator always must have his previous event in DB, which implies 2 reasons for a doublesign by accident:
- 2+ instances of the same validator are running simultaneously (and hence may create events simultaneously), and all heuristics failed to determine that another instance is running.
- On re-downloading or due to a loss of data, validator hasn't downloaded all his previous events, and all heuristics failed to determine that node hasn't synced up.
It means that on migration or upgrade, validators should ensure that their new node has downloaded all previous events of previous node.
On migration to another server, follow the steps:
- Stop previous node
- Ensure the node has stopped and wasn't restarted!
- Copy
datadir
to new server (optional, but it'll ensure that previous events are in DB) - Backup keystore (copy
datadir/keystore
directory). Erase keystore and password from old server! - Follow steps from
How to ensure previous events are downloaded
- Start node as non-validator on new server. Ensure it does connect to the network, and everything is working as you assumed.
- Restart node as validator on new server (only one instance!)
On node upgrade, follow the steps to prevent doublesign:
- Stop the node
- Ensure the node has stopped and wasn't restarted!
- Backup keystore (copy
datadir/keystore
directory) - Re-build or replace the go-opera executable (follow
Node update
instructions) - Follow steps from
How to ensure previous events are downloaded
- Start node as non-validator. Ensure it does connect to the network, and everything is working as you assumed.
- Restart node as validator (only one instance!)
Ensure node has synced up (do any 2):
- Launch a non-validator node first. Check that it has last block >= last block in explorer.
- Launch a non-validator node first with
--exitwhensynced
flag. Wait until it has stopped. - Find previous event ID of your validator by greping logs. Launch a non-validator node first. Wait until it has synced up. Ensure it has received
previous last event by greping logs. Alternatively, you may request events by ID using API
(enable
--rpc
before that)
Ensure that there's no parallel instance of the same validator:
- After previous node has stopped, wait 60-90 minutes. Wait until explorer shows that your validator has downtime >= 40 minutes.
- Launch a non-validator node. Search in logs any events from your validator with ID=X:
grep log.txt | "by=X "
. Wait 40 minutes, ensure new events aren't emitted.
Node has multiple heuristics which it uses to detect not downloaded previous events, or a parallel validator instance. Those heuristics are probabilistic, it's impossible to provide a fully accurate detection of those conditions in a decentralized network. It implies that validators should follow the steps above manually and not rely on heuristics.
DB data may be lost due to the following reasons:
- The node was terminated using
kill -9
, or node has crashed, or server stopped. After this, node will either refuse to start (if DB is corrupted), or it'll start from previous flushed state (data is flushed every epoch, every 10 minutes, when DB buffer is exceeded). - Disk failure. Disk failure may lead to corrupted DB state, in which case the node will refuse to start.
Stop the node. Ensure the node has stopped and wasn't restarted!
killall opera
If needed, backup and erase previous DB files (node won't start if new version isn't compatible with previous DB):
rm -r /path-to-datadir/chaindata
Update and build latest version
git pull origin master
make build
Build output is found in ./build/