-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL driver: on connect try setting wsrep_sync_wait=7, swallow error 1193 #665
Conversation
d70c3b6
to
e99a39b
Compare
Single node testhttps://github.com/Al2Klimov/twintowers with this patch: --- docker-compose.yml
+++ docker-compose.yml
@@ -25,12 +25,12 @@ x-ido: &x-ido
MARIADB_DATABASE: ido
x-dbicinga: &x-dbicinga
- image: mariadb:10
+ image: mysql:5.7
environment:
- MARIADB_RANDOM_ROOT_PASSWORD: '1'
- MARIADB_USER: icingadb
- MARIADB_PASSWORD: icingadb
- MARIADB_DATABASE: icingadb
+ MYSQL_RANDOM_ROOT_PASSWORD: '1'
+ MYSQL_USER: icingadb
+ MYSQL_PASSWORD: icingadb
+ MYSQL_DATABASE: icingadb
x-icingadb-env: &x-icingadb-env
ICINGADB_REDIS_PORT: '6379'
@@ -186,7 +186,7 @@ services:
condition: service_started
dbicinga1:
condition: service_started
- image: icinga/icingadb
+ image: icinga/icingadb:test
environment:
<<: *x-icingadb-env
ICINGADB_REDIS_HOST: redis1
@@ -279,7 +279,7 @@ services:
condition: service_started
dbicinga2:
condition: service_started
- image: icinga/icingadb
+ image: icinga/icingadb:test
environment:
<<: *x-icingadb-env
ICINGADB_REDIS_HOST: redis2 The MySQL image throws 1193 on the command line with SET SESSION wsrep_sync_wait=4. |
What effect does changing |
In contrast to the MariaDB Docker image it doesn't even ship cluster functionality and rejects the var. The perfect single node for testing IMAO. |
Galera testCluster as per #577 (comment) verified with https://github.com/Al2Klimov/provoke-galera (that #577-ish errors can happen under laboratory conditions). In addition there's a Debian 12 load balancer:
With this LB in https://github.com/Al2Klimov/twintowers --- docker-compose.yml
+++ docker-compose.yml
@@ -190,7 +190,7 @@ services:
environment:
<<: *x-icingadb-env
ICINGADB_REDIS_HOST: redis1
- ICINGADB_DATABASE_HOST: dbicinga1
+ ICINGADB_DATABASE_HOST: 10.27.2.59
web1:
depends_on:
@@ -209,7 +209,7 @@ services:
icingaweb.modules.icingadb.redis.redis1.host: redis1
icingaweb.modules.monitoring.commandtransports.icinga2.host: master1
icingaweb.resources.icingaweb_db.host: dbweb1
- icingaweb.resources.icingadb.host: dbicinga1
+ icingaweb.resources.icingadb.host: 10.27.2.59
icingaweb.resources.icinga_ido.host: ido1
volumes:
- ./volumes/icingaweb2/1:/data and a bunch of history --- icinga2.conf
+++ icinga2.conf
@@ -44,3 +44,17 @@ object IdoMysqlConnection "ido-mysql" {
database = "ido"
enable_ha = false
}
+
+object Host "ok" {
+ check_command = "dummy"
+}
+
+for (i in range(10000)) {
+object Service i {
+ host_name = "ok"
+ check_command = "random"
+ max_check_attempts = 1
+ check_interval = 1ms
+ retry_interval = 1ms
+}
+} bad things happen:
But not with this PR! 👍 --- docker-compose.yml
+++ docker-compose.yml
@@ -186,7 +186,7 @@ services:
condition: service_started
dbicinga1:
condition: service_started
- image: icinga/icingadb
+ image: icinga/icingadb:test
environment:
<<: *x-icingadb-env
ICINGADB_REDIS_HOST: redis1
@@ -279,7 +279,7 @@ services:
condition: service_started
dbicinga2:
condition: service_started
- image: icinga/icingadb
+ image: icinga/icingadb:test
environment:
<<: *x-icingadb-env
ICINGADB_REDIS_HOST: redis2 |
pkg/driver/driver.go
Outdated
sql.Register(MySQL, &Driver{ctxDriver: &mysql.MySQLDriver{}, Logger: logger}) | ||
sql.Register(MySQL, &Driver{ctxDriver: &mySQLDriver{}, Logger: logger}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, we are already wrapping the driver. Why do you wrap it once more instead of just initializing the connection there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we have a common Driver, but my stuff is only for MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a simple callback attribute that's just set for MySQL:
&Driver{ctxDriver: &mysql.MySQLDriver{}, initConn: setGaleraOpts, Logger: logger}
Call it if it's set and you no longer need half of the code you're adding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
e99a39b
to
87c520d
Compare
dae7eb3
to
9a47f6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Galera clusters wsrep_sync_wait=4 ensures inserted rows to be synced over all nodes before reporting success to their inserter.
I don't think that's what this variable does, the documentation says:
Setting this variable ensures causality checks will take place before executing an operation of the type specified by the value, ensuring that the statement is executed on a fully synced node.
Which I read as this waits for everything to sync onto the local node before executing the query, not waiting for the result of the query to sync to all other nodes before reporting success.
The MySQL image throws 1193 on the command line with SET SESSION wsrep_sync_wait=4.
In contrast to the MariaDB Docker image it doesn't even ship cluster functionality and rejects the var. The perfect single node for testing IMAO.
So a non-Galera-cluster MariaDB understands the variable but treats it as a no-op?
FWIW: no testing from my side so far, but looks like it should allow doing what we want to do: testing the effect of wsrep_sync_wait=4
.
pkg/driver/mysql.go
Outdated
// setGaleraOpts tries SET SESSION wsrep_sync_wait=4. | ||
// Error 1193 "Unknown system variable" is ignored to support MySQL single nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment would be more helpful if it also said why.
Yes, our tests confirm this. |
9a47f6a
to
da9f80c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be the perfect place for those questions, but I failed finding them being addressed somewhere else. If it was already discussed, please point me there and excuse this comment.
- With no intention to sound rude, but isn't setting
wsrep_sync_wait
more a hack than a solution? The core problem, a foreign key violation due to non-linear or non-synchronizedINSERT
, is not really mitigated by doing so.
How about putting thestate_history
andhistory
UPSERT
commands into one transaction? Of course, this would require some refactoring ofhistory.Sync
.. - Why should mode 4 (INSERT and REPLACE) be used? What about
UPDATE
, which is, e.g., covered by mode 6 (UPDATE, DELETE, INSERT and REPLACE)? Furthermore, areREAD
operations now not ensured to be executed on a fully synced node?
Can we be sure that there are no other races when using a "master-master" replication or a cluster instead of a single node? - What about increasing
wsrep_retry_autocommit
, even if this also seems a bit desperate? Or might this only increase the chance of a deadlock?
Please excuse my rambling above. I hope it's still understandable.
Furthermore, I have added two more trivial code comments below.
err = errors.Wrap(err, "can't execute "+galeraOpts) | ||
} | ||
|
||
if err != nil && errors.Is(err, errUnknownSysVar) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err != nil && errors.Is(err, errUnknownSysVar) { | |
if errors.Is(err, errUnknownSysVar) { |
errors.Is
performs the nil
check for you.
There's the history of the referenced issue starting at #577 (comment) where
Our code should synchronize them insofar that it will only issue the second insert after it has received the result for the insert the second row depends on. For a single server, this should be good enough, even if it happens on different connections. However, in Galera cluster, if both queries are issues on connections to different nodes, the result of the first query might not have propagated yet.
Shouldn't require a transaction, simply using the same connection should suffice. But maybe using a transaction would provide the nicer API on the Go side, but then we should also see if this has a performance impact (I think we observed some non-obvious performance penalties from explicit transactions over auto-commit in the past, maybe @lippserd remembers some details?). As you might deduce from #577, I'm not yet convinced that |
da9f80c
to
71cdede
Compare
… 1193 In Galera clusters wsrep_sync_wait=7 lets statements catch up all pending sync between nodes first. This way new child rows await fresh parent ones from other nodes not to run into foreign key errors. MySQL single nodes will reject this with error 1193 "Unknown system variable" which is OK.
71cdede
to
2f2c943
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have successfully tested the current state on top of my three MariaDB node Galera testing cluster. Therefore, I have also altered the database.options.wsrep_sync_wait
value.
|
||
// WsrepSyncWait defines which kinds of SQL statements catch up all pending sync between nodes first, see: | ||
// https://mariadb.com/kb/en/galera-cluster-system-variables/#wsrep_sync_wait | ||
WsrepSyncWait int `yaml:"wsrep_sync_wait" default:"7"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be documented in a place that users can easily find. The current example config does not contain an options
block. This could perhaps be added, as for many people a look at the default config is a starting point.
Textual documentation would also be helpful. Perhaps also referring to the HA area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current example config does not contain an
options
block.
That's exactly my reason for not documenting it (yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning for the existing options was that those those should not need to be changed by users. The default values should work fine everywhere, they were merely exposed to the config so that we might try to tweak them even in some production environment if problems would arise (but so far, those defaults seem to have worked out fine).
if s, ok := config.Params["wsrep_sync_wait"]; ok { | ||
if i, err := strconv.ParseInt(s, 10, 64); err == nil { | ||
wsrepSyncWait = i | ||
delete(config.Params, "wsrep_sync_wait") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing wsrep_sync_wait
inside the config.Params
feels a bit like a cheat, especially as it will be deleted at this point. Maybe add at least a short comment explaining this?
2f2c943
to
2f4a542
Compare
if config, err := mysql.ParseDSN(name); err == nil { | ||
if s, ok := config.Params["wsrep_sync_wait"]; ok { | ||
if i, err := strconv.ParseInt(s, 10, 64); err == nil { | ||
// MySQL single nodes don't know wsrep_sync_wait and fail with error 1193 "Unknown system variable". | ||
// We have to SET it manually later and swallow error 1193 not to fail our connections. | ||
wsrepSyncWait = i | ||
delete(config.Params, "wsrep_sync_wait") | ||
name = config.FormatDSN() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should actually be possible to completely avoid the DSN for connection to a database by chaining these functions:
- https://pkg.go.dev/github.com/go-sql-driver/mysql#NewConnector
- https://pkg.go.dev/database/sql#OpenDB
- https://pkg.go.dev/github.com/jmoiron/sqlx#NewDb
This should then allow simply passing the config option as a struct member instead.
(And maybe this could also allow us to avoid having to register the custom "icingadb-mysql"
driver at all.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just MySQL? Our Pg driver has a NewConnector() as well.
And what's the problem in general? Describing a connection to a database is the DSN's purpose.
In Galera clusters wsrep_sync_wait=7 lets statements catch up all pending sync between nodes first. This way new child rows await fresh parent ones from other nodes not to run into foreign key errors. MySQL single nodes will reject this with error 1193 "Unknown system variable" which is OK.
fixes #577
To do