-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CpuBoundWork#CpuBoundWork(): don't spin on atomic int to acquire slot #9990
base: master
Are you sure you want to change the base?
Conversation
Low load test
Just a few increments/decrements. 👍 |
High load testIf I literally DoS Icinga with https://github.com/Al2Klimov/i2all.tf/tree/master/i2dos, I get a few of these:
After I stop that program and fire one curl as in my low load test above, I get the same picture: still 12 free slots. 👍 Logs--- lib/base/io-engine.cpp
+++ lib/base/io-engine.cpp
@@ -24,6 +24,7 @@ CpuBoundWork::CpuBoundWork(boost::asio::yield_context yc)
std::unique_lock<std::mutex> lock (sem.Mutex);
if (sem.FreeSlots) {
+ Log(LogInformation, "CpuBoundWork") << "Using one free slot, free: " << sem.FreeSlots << " => " << sem.FreeSlots - 1u;
--sem.FreeSlots;
return;
}
@@ -32,7 +33,9 @@ CpuBoundWork::CpuBoundWork(boost::asio::yield_context yc)
sem.Waiting.emplace(&cv);
lock.unlock();
+ Log(LogInformation, "CpuBoundWork") << "Waiting...";
cv.Wait(yc);
+ Log(LogInformation, "CpuBoundWork") << "Waited!";
}
void CpuBoundWork::Done()
@@ -42,8 +45,10 @@ void CpuBoundWork::Done()
std::unique_lock<std::mutex> lock (sem.Mutex);
if (sem.Waiting.empty()) {
+ Log(LogInformation, "CpuBoundWork") << "Releasing one used slot, free: " << sem.FreeSlots << " => " << sem.FreeSlots + 1u;
++sem.FreeSlots;
} else {
+ Log(LogInformation, "CpuBoundWork") << "Handing over one used slot, free: " << sem.FreeSlots << " => " << sem.FreeSlots;
sem.Waiting.front()->Set();
sem.Waiting.pop();
} |
c11989f
to
8d24525
Compare
lib/base/io-engine.cpp
Outdated
try { | ||
cv->Wait(yc); | ||
} catch (...) { | ||
Done(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is Done()
called here? Wouldn't this release a slot that was never acquired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the pillars of the whole logic:
A regular CpuBoundWork#Done() from CpuBoundWork#~CpuBoundWork() calls AsioConditionVariable#Set() and expects it to successfully finish AsioConditionVariable#Wait() and CpuBoundWork#CpuBoundWork(). The latter implies a later CpuBoundWork#~CpuBoundWork() which again calls CpuBoundWork#Done(). But if AsioConditionVariable#Wait() throws, I can call CpuBoundWork#Done() now or never.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But still, this means that more coroutines can simultaneously acquire CpuBoundMutex
than what is permitted by ioEngine.m_CpuBoundSemaphore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... technically speaking and in an edge case where probably all of them are purged anyway.
Is the way |
8d24525
to
bf74280
Compare
lib/base/io-engine.cpp
Outdated
IoEngine::YieldCurrentCoroutine(yc); | ||
continue; | ||
} | ||
AsioConditionVariable cv (ioEngine.GetIoContext()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put it on the stack, you can't simply call Done()
which could possibly wake any other coroutine. Because if that happened, there would then be dangling pointers in the queue. If you'd retract the queue entry from coroutine, that wouldn't be a problem. And as an extra benefit, you wouldn't create extra slots out of thin air.
bf74280
to
9062934
Compare
bool gotSlot = false; | ||
auto pos (sem.Waiting.insert(sem.Waiting.end(), IoEngine::CpuBoundQueueItem{&strand, cv, &gotSlot})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you're using a boolean pointer here! Why not just use a simple bool type instead?
try { | ||
cv->Wait(yc); | ||
} catch (...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you trying to catch here? AsioConditionVariable#Wait()
asynchronously waits with the non-throwing form of Asio async_wait()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly catch forced_unwind.
} catch (...) { | ||
std::unique_lock<std::mutex> lock (sem.Mutex); | ||
|
||
if (gotSlot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use pos->GotSlot
instead here and don't need to keep track of a bool type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Items get moved out of sem.Waiting which invalidates pos. gotSlot tells me whether pos is still valid or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cppreference says this:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
I would simply use a pointer to CpuBoundQueueItem
for the queue instead then.
IoEngine::CpuBoundQueueItem item{&strand, cv, false};
auto pos (sem.Waiting.emplace(sem.Waiting.end(), &item));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only when the corresponding element is deleted.
Exactly that gotSlot tells.
lib/base/io-engine.cpp
Outdated
continue; | ||
*next.GotSlot = true; | ||
sem.Waiting.pop_front(); | ||
boost::asio::post(*next.Strand, SetAsioCV(std::move(next.CV))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use something like this instead and drop the intermediate class SetAsioCV
entirely.
boost::asio::post(*next.Strand, [cv = std::move(next.CV)]() { cv->Set(); });
9062934
to
a00262f
Compare
IoBoundWorkSlot#~IoBoundWorkSlot() will wait for a free semaphore slot which will be almost immediately released by CpuBoundWork#~CpuBoundWork(). Just releasing the already aquired slot via CpuBoundWork#Done() is more efficient.
This is inefficient and involves unfair scheduling. The latter implies possible bad surprises regarding waiting durations on busy nodes. Instead, use AsioConditionVariable#Wait() if there are no free slots. It's notified by others' CpuBoundWork#~CpuBoundWork() once finished.
a00262f
to
26ef66e
Compare
In addition, v2.14.2 could theoretically misbehave once the free slot amount falls "temporarily" noticeably below zero. Like, three requestors achieve an https://github.com/Icinga/icinga2/blob/v2.14.2/lib/base/io-engine.cpp#L24-L31 So that spinlock blocks not only CPU time, but also slots from legit requestors. The father of all spinlocks, so to say. 🙈 #10117 (comment) |
This is inefficient and involves unfair scheduling. The latter implies
possible bad surprises regarding waiting durations on busy nodes. Instead,
use AsioConditionVariable#Wait() if there are no free slots. It's notified
by others' CpuBoundWork#~CpuBoundWork() once finished.
fixes #9988
Also, the current implementation is a spin-lock. 🙈 #10117 (comment)