-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only use requires_new when needed #1108
base: master
Are you sure you want to change the base?
Conversation
SAVEPOINTs are costly: https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/ They were added in fc2ba56 to stop the events with the wrong version being committed but if no version is specified then they are not needed.
I must say that's intriguing — I didn't find a drawback in this solution last time I was briefly thinking about it. But I think such a change would require much more verification and test cases to prove we're not letting in any data corruption. For sure:
Did you have a change to measure performance impact on RES with sample payloads/queries in you application? We're only adding one savepoint here. RES also supports MySQL. Do you perhaps know if situation there is any different from Postgres? |
@paneq Any shower thoughts? 🙃 |
The included benchmark is totally not convincing for me right now. The conclusion in the article says:
But the author does not include 3rd test case where transactions are 50% longer (90 groups of statements instead of 60) but without savepoints (or with a different non-important statement) to prove that the savepoints are the problem and not the length itself. I think the author expected 50% more operation to take 50% more time. But the test is devised in a way that there is contention on read. I am not sure that scales linearly. So first, I would like to see it reproduced. It does not sound hard but I don't have the time.
Which is also not our case. We don't use it after every active record statement. We use it per API call. Clients usually call event_store.publish once or a few times in a transaction. Also, the original test uses prepared statements which Rails apps and event store don't. Every Active Record call from rails app makes a networking call to PG. I suspect that if there is any performance effect of using savepoints in PG, it's negligable compared to making one more SQL query. |
Thanks yes, that article benchmark isn't as useful as I first thought, apologies. @paneq how about here where they make the exact same critique of the benchmark as you and add a Our production was almost grinding to a halt, we found that the
but these are in a transaction already so that will happen, or am I missing something? The existing test shows this, if you remove the subtransaction (i.e. the The only reasons I can think to use subtransactions are:
here the user will always be saved. Are you able to explain why you are using the subtransaction in the first place?
Yes, this is something I was very unsure about. Looking at the code I could only see RES refusing to add an event when the version mismatched but I wasn't sure about the other cases. To be able to answer this I'd need to understand why you are using a subtransaction (as above). Thank you for taking the time to considering this. |
Just FYI, I won't have the time soon to commit to this discussion. |
I will try to elaborate on that later, but just for the sake of leaving some information: We did have encounter a situation in which lot of savepoints was serious performance issue (thousands of savepoints de facto brought our application down). That was on MySQL 5.7 (what is curious, we did not have that issue on MySQL 5.6). |
If you need more convincing about |
Just wanted to say I'm revisiting this PR now. I've re-read linked articles and the comments you've all left in this thread. Summary of all mentioned problems with SAVEPOINT:
Correct me if I'm wrong and understood the issues differently than you did. I also want to respond to one statement about RES use of SAVEPOINT before digging deeper how we can perhaps not use SAVEPOINT at all:
It is not possible with
|
I don't think there's a good reason to use subtransaction in RES as it is used currently. We definitely want to perform two INSERT statements in one transaction. So we wrap them in such. Given These three examples perform the same in terms of RES expectations: require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "activerecord"
gem "pg"
end
require "active_record"
ActiveRecord::Base.establish_connection(ENV.fetch("DATABASE_URL"))
ActiveRecord::Schema.define do
create_table :transactions, force: true do |t|
t.string :label
t.integer :amount
end
create_table :events, force: true do |t|
t.string :name
t.string :event_id
end
add_index :events, :event_id, unique: true
end
Event = Class.new(ActiveRecord::Base)
Transaction = Class.new(ActiveRecord::Base)
ActiveRecord::Base.logger =
Logger
.new(STDOUT)
.tap do |l|
l.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" }
end
raise unless Transaction.count == 0
raise unless Event.count == 0
# Transaction Count (0.1ms) SELECT COUNT(*) FROM "transactions"
# Event Count (0.1ms) SELECT COUNT(*) FROM "events"
begin
ActiveRecord::Base.transaction do
Transaction.create(label: "kaka", amount: 100)
event_id = SecureRandom.uuid
Event.create(name: "deposited", event_id: event_id)
Event.create(name: "deposited", event_id: event_id)
end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
# TRANSACTION (0.0ms) BEGIN
# Transaction Create (0.2ms) INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id" [["label", "kaka"], ["amount", 100]]
# Event Create (0.1ms) INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id" [["name", "deposited"], ["event_id", "628db1c3-a802-47bb-bcde-b68898fc8d1f"]]
# Event Create (0.2ms) INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id" [["name", "deposited"], ["event_id", "628db1c3-a802-47bb-bcde-b68898fc8d1f"]]
# TRANSACTION (0.0ms) ROLLBACK
# Transaction Count (0.1ms) SELECT COUNT(*) FROM "transactions"
# Event Count (0.1ms) SELECT COUNT(*) FROM "events"
begin
ActiveRecord::Base.transaction do
Transaction.create(label: "kaka", amount: 100)
ActiveRecord::Base.transaction do
event_id = SecureRandom.uuid
Event.create(name: "deposited", event_id: event_id)
Event.create(name: "deposited", event_id: event_id)
end
end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
# TRANSACTION (0.0ms) BEGIN
# Transaction Create (0.1ms) INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id" [["label", "kaka"], ["amount", 100]]
# Event Create (0.1ms) INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id" [["name", "deposited"], ["event_id", "fcc5c949-c40e-4101-b0ba-60ab49fea0fd"]]
# Event Create (0.1ms) INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id" [["name", "deposited"], ["event_id", "fcc5c949-c40e-4101-b0ba-60ab49fea0fd"]]
# TRANSACTION (0.0ms) ROLLBACK
# Transaction Count (0.0ms) SELECT COUNT(*) FROM "transactions"
# Event Count (0.0ms) SELECT COUNT(*) FROM "events"
begin
ActiveRecord::Base.transaction do
Transaction.create(label: "kaka", amount: 100)
ActiveRecord::Base.transaction(requires_new: true) do
event_id = SecureRandom.uuid
Event.create(name: "deposited", event_id: event_id)
Event.create(name: "deposited", event_id: event_id)
end
end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
# TRANSACTION (0.0ms) BEGIN
# Transaction Create (0.1ms) INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id" [["label", "kaka"], ["amount", 100]]
# TRANSACTION (0.0ms) SAVEPOINT active_record_1
# Event Create (0.1ms) INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id" [["name", "deposited"], ["event_id", "e7edef32-6dba-4935-bc69-ccb4d28aa68e"]]
# Event Create (0.1ms) INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id" [["name", "deposited"], ["event_id", "e7edef32-6dba-4935-bc69-ccb4d28aa68e"]]
# TRANSACTION (0.0ms) ROLLBACK TO SAVEPOINT active_record_1
# TRANSACTION (0.0ms) ROLLBACK
# Transaction Count (0.0ms) SELECT COUNT(*) FROM "transactions"
# Event Count (0.0ms) SELECT COUNT(*) FROM "events" The only reason why I'd personally prefer This This was pointed out previously by @nikolai-b too:
|
I'll rebase this PR next and review existing tests around transactions and expectations described earlier. In the meantime I've compared use-vs-no-use of SAVEPOINT on Postgres 15.5 and the difference is not going to be noticeable:
It's still worth not doing something not necessary. And there are cases where even a single SAVEPOINT is a problem, as described in the Gitlab article. |
Thank you so much for the time you've invested in this issue, it is very impressive. The issue we experienced with perform_100_appends_in_outer_transaction =
lambda do
ActiveRecord::Base.transaction do
id = RailsEventStoreActiveRecord::Event.limit(1).lock("FOR UPDATE").ids[0]
100.times { event_store.append(mk_event.call) }
RailsEventStoreActiveRecord::Event.where(id: id).update_all(id: id)
end
end but to cause the performance issue I think it needs to do the update inside the SAVEPOINT. I've also tried using multiple threads to simulate many clients but still no slow down. I don't think in RES the existing SAVEPOINT does much harm (as you release it quickly) but I'd argue that it isn't needed and for that reason should be removed. |
#1108 Co-authored-by: Szymon Fiedler <[email protected]>
#1108 Co-authored-by: Szymon Fiedler <[email protected]>
SAVEPOINTs are costly:
https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/
They were added in fc2ba56 to stop the events with the wrong version being committed
but if no version is specified then they are not needed.
I've not added any specs as I wanted to know your thoughts on this.