-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus iptables rules metric only counts programmed rules. #9374
base: master
Are you sure you want to change the base?
Prometheus iptables rules metric only counts programmed rules. #9374
Conversation
55e5cb8
to
0efef73
Compare
0efef73
to
582fb99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ref counting looks correct 🎉 some comments on the test.
Thinking about it, you could have tested this at UT level pretty effectively if there's a way to get the prom gauge. Then you'd be in control of the exact number of rules and wouldn't need approximates. Not bad to have a bit more coverage of the real gauges though so I'm OK with FV too.
@@ -200,6 +206,87 @@ var _ = infrastructure.DatastoreDescribe("pre-dnat with initialized Felix, 2 wor | |||
}) | |||
|
|||
Context("with pre-DNAT policy to open pinhole to 8055", func() { | |||
// Exec iptables-save, or iptables-save -t [table], if table is not "". | |||
dumpIPTables := func(table string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have utility methods for getting chains on the Felix object.
For example felix.IPTablesChains(table) ...
// The number of cali rules in the dumped tables should increase by the same delta as the metrics | ||
Eventually(dumpMangleTable).Should(HaveLen(len(baselineMangleTableIptablesSave) + mangleRulesMetricDelta)) | ||
Eventually(dumpFilterTable).Should(HaveLen(len(baselineFilterTableIptablesSave) + filterRulesMetricDelta)) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend doing some more steps after this:
- Change the type of the policy from pre-DNAT to normal. That's the most fiddly change to handle since it goes from referenced in one table to referenced in the other.
- Delete the policy, should go back to the count with the hep.
- Delete the hep, should get back to the base count.
// Returns nil when the value of f() hasn't changed for the duration settledPeriod. | ||
// Returns error if f() never settles up to the timeout. | ||
// Immediately returns error if timeout is lower than settledPeriod. | ||
waitToSettle := func(f func() int, settledPeriod, timeout time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on slack, would prefer not to rely on settling, if there's something explicit that we can wait for. It's not a bad approach, it's just a bit slow (and if it flakes, you have to bump the settled timeout so it gets slower...)
If we do go down this path, I suggest making it a general utility method. Kubernetes has a wait
package with similar utility funcs in it.
return errors.New("Timed out waiting for IPTables to settle.") | ||
default: | ||
last = cur | ||
time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically reach for 100ms for a sleep loop. We don't want to tight loop but probing 1x per second isn't normally a big deal.
settledTimer = nil | ||
} else { | ||
select { | ||
case <-settledTimer.C: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the way you're using the settledTimer, you're not actually using it as a timer at all! You only select on it in a non-blocking way. I think this code would be clearer if you just recorded the time since the last change and did
last := f()
lastChange := time.Now()
for {
cur := f()
if cur != last {
lastChange := time.Now()
}
last = cur
if time.Since(lastChange) > settledPeriod {
// we're done
}
...
}
} | ||
} | ||
select { | ||
case <-timeoutTimer.C: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a sleep and a timeout channel is slightly mixing paradigms. While you''re asleep you're not checking the timeout channel, which could be a bug (or not matter at all). In this case, it means that, if you just miss a timeout, it'll be >1s before you check the timeout again and you'll do an extra poll of f()
; not clear if that was intended (it probably doesn't matter for this use case).
Best to jump one way or the other to make your intent clear. Either:
- Use Sleep() and record the start time and do
if time.Since(startTime) > timeoutPeriod { ... timeout ...}
- Use channels for the sleep.
case <-time.After(delay): ...
so that the sleep can be interrupted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you also wanted the settled timer to interrupt the sleep (not clear that you do, since you probably want to run f() one more time before declaring it settled), then... A powerful technique is to mask channels in your main select block. To mask a channel, set it to nil; that prevents its case from firing. So you could do:
var settledC chan <-time.Time
for {
select {
case <- time.After(time.Second):
cur := f()
if cur == last {
if settledC == nil {
settledC = time.After(...)
}
} else {
settledC = nil
}
case <- settledC:
// done!
case <-timeoutC:
// failed
}
}
Description
Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.