Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientContextImpl: Cap the number and age of beacons #191

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JJL772
Copy link

@JJL772 JJL772 commented Oct 25, 2023

Each beacon has an associated mutex. If we don't cap the beacon count, IOCs running on resource limited platforms like RTEMS may eventually run out of resources and crash.

Closes #184

The configuration options are probably unnecessary, let me know if I should remove them.

Requires some changes to pvData: epics-base/pvDataCPP#94

Copy link
Member

@mdavidsaver mdavidsaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to see a PR addressing this issue. I think some further work is required though.

Comment on lines 4364 to 4365
/* Before creating a new beacon, cleanup any old ones */
for (AddressBeaconHandlerMap::iterator it = m_beaconHandlers.begin(); it != m_beaconHandlers.end();)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a linear iteration of the list of tracked servers each time a beacon is received. If the number of servers grows large, or one gets stuck in a restart loop, this could potentially take up a lot of CPU time.

Timer InternalClientContextImpl::m_timer is available in getBeaconHandler(). cf. class ChannelSearchManager for an example of using Timer.

@@ -3957,7 +3957,7 @@ class InternalClientContextImpl :
static size_t num_instances;

InternalClientContextImpl(const Configuration::shared_pointer& conf) :
m_addressList(""), m_autoAddressList(true), m_connectionTimeout(30.0f), m_beaconPeriod(15.0f),
m_addressList(""), m_autoAddressList(true), m_connectionTimeout(30.0f), m_beaconPeriod(15.0f), m_maxBeacons(10), m_maxBeaconLifetime(15),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A limit of 10 servers, and a 15 second lifetime seem quite small. Have you checked one a network with more than 10 PVA servers to see how often ChannelSearchManager::newServerDetected() is called?

@@ -4115,6 +4117,8 @@ class InternalClientContextImpl :
m_beaconPeriod = m_configuration->getPropertyAsFloat("EPICS_PVA_BEACON_PERIOD", m_beaconPeriod);
m_broadcastPort = m_configuration->getPropertyAsInteger("EPICS_PVA_BROADCAST_PORT", m_broadcastPort);
m_receiveBufferSize = m_configuration->getPropertyAsInteger("EPICS_PVA_MAX_ARRAY_BYTES", m_receiveBufferSize);
m_maxBeacons = m_configuration->getPropertyAsInteger("EPICS_PVA_BEACON_MAX", m_maxBeacons);
m_maxBeaconLifetime = m_configuration->getPropertyAsFloat("EPICS_PVA_BEACON_LIFETIME", m_maxBeaconLifetime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see making the number of servers configurable. However, I don't think the lifetime (aka. 2x the max beacon period of 180 seconds) needs to be separately configurable.

@mdavidsaver
Copy link
Member

Since it is not super obvious. The Beacon TX timing of pvAccessCPP differs from what experience with CA servers might lead you to expect.

_fastBeaconPeriod(std::max(context->getBeaconPeriod(), EPICS_PVA_MIN_BEACON_PERIOD)),
_slowBeaconPeriod(std::max(180.0, _fastBeaconPeriod)), // TODO configurable
_beaconCountLimit((int16)std::max(10.0f, EPICS_PVA_MIN_BEACON_COUNT_LIMIT)), // TODO configurable

A server will send out the first 10 beacons (not configurable) with a 15 second interval (by default), then switch to a 180 second period (not configurable). While $EPICS_PVAS_BEACON_PERIOD can override the first "fast" period, I don't think this is in practice useful.

So I think the beacon tracking lifetime must be >= 360 seconds.

(fyi. with PVXS I try to follow the same model and timings, with a non-configurable limit of 20k servers. Of course, there I only allocate ~64 bytes per server)

@mdavidsaver
Copy link
Member

The windows CI failures are due to epics-base/ci-scripts#84. When you update, please rebase to pick up ed7eae5.

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.70 completed (commit cdf3720715 by @JJL772)

@JJL772 JJL772 force-pushed the fix_semaphore branch 2 times, most recently from 2b51b97 to 777d68d Compare December 13, 2023 00:27
@JJL772
Copy link
Author

JJL772 commented Dec 13, 2023

@mdavidsaver Thanks for the feedback! I finally got around to applying the requested changes.
I ended up using pvxs as a reference and copied the max beacon lifetime (360s) and beacon limit (20000).

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.74 failed (commit 73c3932b45 by @JJL772)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.75 failed (commit 4e054f2e07 by @JJL772)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.79 completed (commit 2ae88b70f1 by @JJL772)

@JJL772
Copy link
Author

JJL772 commented Feb 28, 2024

@mdavidsaver Just wanted to follow up, are there any other changes required for this?

Comment on lines +4389 to +4398
if (m_beaconHandlers.size() >= maxTrackedBeacons)
{
char ipa[64];
sockAddrToDottedIP(&responseFrom->sa, ipa, sizeof(ipa));
LOG(logLevelDebug, "Tracked beacon limit reached (%d), ignoring %s\n", maxTrackedBeacons, ipa);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To minimize log spam it would be friendlier to only log when size()==max. So once each time the limit is reached, but not again until falling below the limit. eg. consider if some PVA server gets stuck in a reset loop.

// stores weak_ptr
handler.reset(new BeaconHandler(internal_from_this(), responseFrom));
m_timer->scheduleAfterDelay(BeaconCleanupCallback::shared_pointer(new BeaconCleanupCallback(*this, *responseFrom)), maxBeaconLifetime);
Copy link
Member

@mdavidsaver mdavidsaver Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class Timer can be a little strange to use. As written, this timer will never be cancel()ed. If fact, since the callback pointer isn't being stored, it can't be canceled. So each time a new server appears, this timer will drop it 360 seconds later. Then, since that server will probably still be active, it will be "new" again 180 after than.

I think you are on the right track though. What you would need to do is to store a BeaconCleanupCallback in (or as part of) BeaconHandler. Then the logic would be to start the timer on creation (first beacon). Then cancel and restart it on each subsequent beacon. This way the timer will only expire when the corresponding server stops sending beacons.

@JJL772
Copy link
Author

JJL772 commented Jun 10, 2024

This PR now depends on some changes made to pvData: epics-base/pvDataCPP#94

I'm going to mark this as a draft for now because I'm not exactly happy with these changes yet.

@JJL772 JJL772 marked this pull request as draft June 10, 2024 19:43
@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.108 failed (commit c4e4658381 by @JJL772)

Each beacon has an associated mutex. If we allocate
too many beacons on resource constrained systems, i.e.
RTEMS, we may run out of resources and crash.
@JJL772 JJL772 marked this pull request as ready for review June 12, 2024 20:44
@AppVeyorBot
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource leak in Beacon tracking
3 participants