-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FC] process FC after apply view #3326
base: master
Are you sure you want to change the base?
[FC] process FC after apply view #3326
Conversation
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
da85acb
to
ffbd073
Compare
ffbd073
to
1536dfb
Compare
97e081f
to
95b1e63
Compare
6e1f7d5
to
8dca8ef
Compare
7f0a73e
to
e580645
Compare
Putting back delay for 60 sec as we found some cases where oper state update handling is delayed due to FC configuration after APPLY_VIEW |
m_bufferQueueConfigTable(db, CFG_BUFFER_QUEUE_TABLE_NAME), | ||
m_bufferPgConfigTable(db, CFG_BUFFER_PG_TABLE_NAME), | ||
m_deviceMetadataConfigTable(db, CFG_DEVICE_METADATA_TABLE_NAME) | ||
{ | ||
SWSS_LOG_ENTER(); | ||
m_delayTimer = new SelectableTimer(timespec{.tv_sec = FLEX_COUNTER_DELAY_SEC, .tv_nsec = 0}); | ||
if (WarmStart::isWarmStart()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm this will also handle fast-reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
SWSS_LOG_ENTER(); | ||
|
||
SWSS_LOG_NOTICE("Processing counters"); | ||
m_delayTimer->stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about following?
if (!m_delayTimerExpired)
{
m_delayTimer->stop();
m_delayTimerExpired = true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
orchagent/flexcounterorch.cpp
Outdated
@@ -254,6 +258,15 @@ void FlexCounterOrch::doTask(Consumer &consumer) | |||
} | |||
} | |||
|
|||
void FlexCounterOrch::doTask(SelectableTimer &timer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we delete the timer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
orchagent/flexcounterorch.cpp
Outdated
m_delayTimer = new SelectableTimer(timespec{.tv_sec = FLEX_COUNTER_DELAY_SEC, .tv_nsec = 0}); | ||
if (WarmStart::isWarmStart()) | ||
{ | ||
auto executor = new ExecutableTimer(m_delayTimer, this, "FLEX_COUNTER_DELAY"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced with this approach. I can use some to understand the approach.
Previously we had hardcoded delay in FC.
The hardcoded delay became a problem and we needed more deterministic delayed approach. So we added fast-reboot new infra which used warm-reboot kind of reconciliation and used fast-reboot done status to enable FC.
But now we again go back to hardcoded delay. I am afraid that to solve the new problem we (in a way) drift towards the day1 problem. Some scenarios that I can image that will now become problematic can be when 60s not being enough and overlaps w/ apply_view. Or similar race conditions that led to more deterministic delay.
Also, I think the concerns that Jingwen raised about new entries showing up in the table will still be present, right?
The config will still look different than what init config would want.
@wen587 , what do you think about this? Does this change really solve the concern that you have raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vaibhavhd This is the problem that is being solved sonic-net/sonic-buildimage#20302
The hardcoded delay is not a reported problem and this PR does not attempt to address that.
Not sure about race conditions, could you provide an example? The timer can't overlap with apply view as it is triggered in a different flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vaibhavhd , i think so. We want the config just stay as it is in init_cfg.json. We do not want any derived changes after service restart. Because all config db are static in Golden Config. We need Golden Config to be ground of truth.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Stepan Blyschak <[email protected]>
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Related to sonic-net/sonic-swss#3326. Why I did it Simplify approach to delaying counters on warm boot and fast boot. Removed FLEX_COUNTER_DELAY_STATUS_FIELD and instead postpone all FC processing to happen after apply view to not delay data plane configuration. The CONFIG_DB should not be updated in runtime anymore for counters to be delayed. To address sonic-net#20302. Work item tracking Microsoft ADO (number only): How I did it Removed FLEX_COUNTER_DELAY_STATUS_FIELD set in enable_counters.py. How to verify it Run warm-boot - make sure FC orch runs only after APPLY_VIEW.
Infra issue with tests: failure to install dotnet package:
Restarting |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@qiluo-msft @bingwang-ms @vaibhavhd Can be merged? |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
What I did
Simplify approach to delaying counters on warm boot and fast boot. Removed FLEX_COUNTER_DELAY_STATUS_FIELD and instead postpone all FC processing to happen after apply view to not delay data plane configuration.
The CONFIG_DB should not be updated in runtime anymore for counters to be delayed.
Why I did it
To address sonic-net/sonic-buildimage#20302.
How I verified it
Run warm-boot - make sure FC orch runs only after APPLY_VIEW.
Details if related