clientContextImpl: Cap the number and age of beacons #191

JJL772 · 2023-10-25T01:38:07Z

Each beacon has an associated mutex. If we don't cap the beacon count, IOCs running on resource limited platforms like RTEMS may eventually run out of resources and crash.

Closes #184

The configuration options are probably unnecessary, let me know if I should remove them.

Requires some changes to pvData: epics-base/pvDataCPP#94

mdavidsaver

I am happy to see a PR addressing this issue. I think some further work is required though.

mdavidsaver · 2023-10-25T02:14:12Z

src/remoteClient/clientContextImpl.cpp

+            /* Before creating a new beacon, cleanup any old ones */
+            for (AddressBeaconHandlerMap::iterator it = m_beaconHandlers.begin(); it != m_beaconHandlers.end();)


This is a linear iteration of the list of tracked servers each time a beacon is received. If the number of servers grows large, or one gets stuck in a restart loop, this could potentially take up a lot of CPU time.

Timer InternalClientContextImpl::m_timer is available in getBeaconHandler(). cf. class ChannelSearchManager for an example of using Timer.

mdavidsaver · 2023-10-25T02:21:40Z

src/remoteClient/clientContextImpl.cpp

@@ -3957,7 +3957,7 @@ class InternalClientContextImpl :
    static size_t num_instances;

    InternalClientContextImpl(const Configuration::shared_pointer& conf) :
-        m_addressList(""), m_autoAddressList(true), m_connectionTimeout(30.0f), m_beaconPeriod(15.0f),
+        m_addressList(""), m_autoAddressList(true), m_connectionTimeout(30.0f), m_beaconPeriod(15.0f), m_maxBeacons(10), m_maxBeaconLifetime(15),


A limit of 10 servers, and a 15 second lifetime seem quite small. Have you checked one a network with more than 10 PVA servers to see how often ChannelSearchManager::newServerDetected() is called?

mdavidsaver · 2023-10-25T02:28:10Z

src/remoteClient/clientContextImpl.cpp

@@ -4115,6 +4117,8 @@ class InternalClientContextImpl :
        m_beaconPeriod = m_configuration->getPropertyAsFloat("EPICS_PVA_BEACON_PERIOD", m_beaconPeriod);
        m_broadcastPort = m_configuration->getPropertyAsInteger("EPICS_PVA_BROADCAST_PORT", m_broadcastPort);
        m_receiveBufferSize = m_configuration->getPropertyAsInteger("EPICS_PVA_MAX_ARRAY_BYTES", m_receiveBufferSize);
+        m_maxBeacons = m_configuration->getPropertyAsInteger("EPICS_PVA_BEACON_MAX", m_maxBeacons);
+        m_maxBeaconLifetime = m_configuration->getPropertyAsFloat("EPICS_PVA_BEACON_LIFETIME", m_maxBeaconLifetime);


I can see making the number of servers configurable. However, I don't think the lifetime (aka. 2x the max beacon period of 180 seconds) needs to be separately configurable.

mdavidsaver · 2023-10-25T02:42:00Z

Since it is not super obvious. The Beacon TX timing of pvAccessCPP differs from what experience with CA servers might lead you to expect.

pvAccessCPP/src/server/beaconEmitter.cpp

Lines 31 to 33 in 581d100

    
           _fastBeaconPeriod(std::max(context->getBeaconPeriod(), EPICS_PVA_MIN_BEACON_PERIOD)), 
        
           _slowBeaconPeriod(std::max(180.0, _fastBeaconPeriod)), // TODO configurable 
        
           _beaconCountLimit((int16)std::max(10.0f, EPICS_PVA_MIN_BEACON_COUNT_LIMIT)), // TODO configurable

A server will send out the first 10 beacons (not configurable) with a 15 second interval (by default), then switch to a 180 second period (not configurable). While $EPICS_PVAS_BEACON_PERIOD can override the first "fast" period, I don't think this is in practice useful.

So I think the beacon tracking lifetime must be >= 360 seconds.

(fyi. with PVXS I try to follow the same model and timings, with a non-configurable limit of 20k servers. Of course, there I only allocate ~64 bytes per server)

mdavidsaver · 2023-10-25T03:06:23Z

The windows CI failures are due to epics-base/ci-scripts#84. When you update, please rebase to pick up ed7eae5.

AppVeyorBot · 2023-10-25T05:26:03Z

✅ Build pvAccessCPP 1.0.70 completed (commit cdf3720715 by @JJL772)

JJL772 · 2023-12-13T00:30:43Z

@mdavidsaver Thanks for the feedback! I finally got around to applying the requested changes.
I ended up using pvxs as a reference and copied the max beacon lifetime (360s) and beacon limit (20000).

AppVeyorBot · 2023-12-13T01:11:05Z

❌ Build pvAccessCPP 1.0.74 failed (commit 73c3932b45 by @JJL772)

AppVeyorBot · 2023-12-13T01:54:52Z

❌ Build pvAccessCPP 1.0.75 failed (commit 4e054f2e07 by @JJL772)

AppVeyorBot · 2023-12-20T02:51:13Z

✅ Build pvAccessCPP 1.0.79 completed (commit 2ae88b70f1 by @JJL772)

JJL772 · 2024-02-28T20:21:03Z

@mdavidsaver Just wanted to follow up, are there any other changes required for this?

mdavidsaver · 2024-02-28T21:58:31Z

src/remoteClient/clientContextImpl.cpp

+            if (m_beaconHandlers.size() >= maxTrackedBeacons)
+            {
+                char ipa[64];
+                sockAddrToDottedIP(&responseFrom->sa, ipa, sizeof(ipa));
+                LOG(logLevelDebug, "Tracked beacon limit reached (%d), ignoring %s\n", maxTrackedBeacons, ipa);


To minimize log spam it would be friendlier to only log when size()==max. So once each time the limit is reached, but not again until falling below the limit. eg. consider if some PVA server gets stuck in a reset loop.

mdavidsaver · 2024-02-28T22:10:44Z

src/remoteClient/clientContextImpl.cpp

            // stores weak_ptr
            handler.reset(new BeaconHandler(internal_from_this(), responseFrom));
+            m_timer->scheduleAfterDelay(BeaconCleanupCallback::shared_pointer(new BeaconCleanupCallback(*this, *responseFrom)), maxBeaconLifetime);


class Timer can be a little strange to use. As written, this timer will never be cancel()ed. If fact, since the callback pointer isn't being stored, it can't be canceled. So each time a new server appears, this timer will drop it 360 seconds later. Then, since that server will probably still be active, it will be "new" again 180 after than.

I think you are on the right track though. What you would need to do is to store a BeaconCleanupCallback in (or as part of) BeaconHandler. Then the logic would be to start the timer on creation (first beacon). Then cancel and restart it on each subsequent beacon. This way the timer will only expire when the corresponding server stops sending beacons.

JJL772 · 2024-06-10T19:42:58Z

This PR now depends on some changes made to pvData: epics-base/pvDataCPP#94

I'm going to mark this as a draft for now because I'm not exactly happy with these changes yet.

AppVeyorBot · 2024-06-11T03:03:04Z

❌ Build pvAccessCPP 1.0.108 failed (commit c4e4658381 by @JJL772)

Each beacon has an associated mutex. If we allocate too many beacons on resource constrained systems, i.e. RTEMS, we may run out of resources and crash.

AppVeyorBot · 2024-06-13T01:03:06Z

✅ Build pvAccessCPP 1.0.109 completed (commit 9651462441 by @JJL772)

mdavidsaver requested changes Oct 25, 2023

View reviewed changes

JJL772 force-pushed the fix_semaphore branch 2 times, most recently from 2b51b97 to 777d68d Compare December 13, 2023 00:27

JJL772 force-pushed the fix_semaphore branch from 777d68d to afca135 Compare December 13, 2023 01:12

JJL772 force-pushed the fix_semaphore branch from afca135 to d44adc9 Compare December 19, 2023 22:52

mdavidsaver requested changes Feb 28, 2024

View reviewed changes

JJL772 force-pushed the fix_semaphore branch from d44adc9 to 4402841 Compare June 10, 2024 19:37

JJL772 marked this pull request as draft June 10, 2024 19:43

clientContextImpl: Cap the number and age of beacons

1bcc944

Each beacon has an associated mutex. If we allocate too many beacons on resource constrained systems, i.e. RTEMS, we may run out of resources and crash.

JJL772 force-pushed the fix_semaphore branch from 4402841 to 1bcc944 Compare June 12, 2024 20:43

JJL772 marked this pull request as ready for review June 12, 2024 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clientContextImpl: Cap the number and age of beacons #191

clientContextImpl: Cap the number and age of beacons #191

JJL772 commented Oct 25, 2023 •

edited

Loading

mdavidsaver left a comment

mdavidsaver Oct 25, 2023

mdavidsaver Oct 25, 2023

mdavidsaver Oct 25, 2023

mdavidsaver commented Oct 25, 2023

mdavidsaver commented Oct 25, 2023

AppVeyorBot commented Oct 25, 2023

JJL772 commented Dec 13, 2023

AppVeyorBot commented Dec 13, 2023

AppVeyorBot commented Dec 13, 2023

AppVeyorBot commented Dec 20, 2023

JJL772 commented Feb 28, 2024

mdavidsaver Feb 28, 2024

mdavidsaver Feb 28, 2024 •

edited

Loading

JJL772 commented Jun 10, 2024

AppVeyorBot commented Jun 11, 2024

AppVeyorBot commented Jun 13, 2024

		/* Before creating a new beacon, cleanup any old ones */
		for (AddressBeaconHandlerMap::iterator it = m_beaconHandlers.begin(); it != m_beaconHandlers.end();)

clientContextImpl: Cap the number and age of beacons #191

Are you sure you want to change the base?

clientContextImpl: Cap the number and age of beacons #191

Conversation

JJL772 commented Oct 25, 2023 • edited Loading

mdavidsaver left a comment

Choose a reason for hiding this comment

mdavidsaver Oct 25, 2023

Choose a reason for hiding this comment

mdavidsaver Oct 25, 2023

Choose a reason for hiding this comment

mdavidsaver Oct 25, 2023

Choose a reason for hiding this comment

mdavidsaver commented Oct 25, 2023

mdavidsaver commented Oct 25, 2023

AppVeyorBot commented Oct 25, 2023

JJL772 commented Dec 13, 2023

AppVeyorBot commented Dec 13, 2023

AppVeyorBot commented Dec 13, 2023

AppVeyorBot commented Dec 20, 2023

JJL772 commented Feb 28, 2024

mdavidsaver Feb 28, 2024

Choose a reason for hiding this comment

mdavidsaver Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

JJL772 commented Jun 10, 2024

AppVeyorBot commented Jun 11, 2024

AppVeyorBot commented Jun 13, 2024

JJL772 commented Oct 25, 2023 •

edited

Loading

mdavidsaver Feb 28, 2024 •

edited

Loading