Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs & comments for the behavior of the RecordSubscribe service, #148

Merged
merged 1 commit into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions acctz/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Accounting Records are available at a gNSI origin:
gnsi.acctz

Records may be streamed from a system at request of the remote collector,
via the AccountingPull() service/rpc.
via the RecordSubscribe() service/rpc.

Configuration of the Accounting service is made through standard
gNxI methods using the defined YANG model.
Expand All @@ -26,9 +26,8 @@ time at which the accounted event occured, local to the system which sends
the message. This could be different from the time received at the Collector
and the time the Record was emitted from the system.

The stream method requires that acknowledgements be sent periodically
in order to signal both which messages have been successfully processed
and that the remote collector is alive.
The stream continues as new records are received by the accounting subsystem,
ending when the gNSI session ends or an error occurs.

Devices should maintain a history of accounting records so that they can be
retrieved periodically by newly and already connected Collectors. The depth
Expand Down
11 changes: 7 additions & 4 deletions acctz/acctz.proto
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,13 @@ service Acctz {
// System should be sent to the Collector. A current time indicates that only
// new messages should be sent.
//
// The Collector must send a RecordRequest at least every 60 seconds, in
// order to signal its liveliness to the system. Failure to send a
// RecordRequest for more than 120 seconds will cause the connection to be
// reset.
// The stream continues ad infinitum, until the gNSI session is severed.
Copy link
Contributor

@nmahabaleshwar nmahabaleshwar Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update the comment at line 56 ?

If this results in no records to send, the server should return an OK error and zero records.

I don't think an OK error should be sent to the collector if no records need to be sent, the connection stays open and RecordResponses will be sent for future events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should it return?

If the subscription succeeds, but either the history is empty or a time is requested that both does not correspond with a record in the history and there are no records after that time in the history, then the current text seems reasonable. For example, the client might request now() or a future time (T+60s, which the server should probably be permitted to assume to be now()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return an OK error in those scenarios, the stream will be closed which we don't want right ?

In all those scenarios, we want the stream to be open and the collector should get RecordResponses when accounting events occur after the given timestamp in the original RecordRequest message.

If the server wants to explicitly signal the client that I don't have anything for you now, but will send records in the future, maybe we create a new message or add a new field in the RecordResponse which indicates that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I must misunderstand/misremember how the streaming works due to my limited use of it. I thought that the return was the result of the subscription; ie: RecordSubscribe() received, parsed, and started successfully. After which, records would be sent asynchronously, allowing the client to send and receive other messages as it wishes, until the session is closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, perhaps: "if there's an error on the RPC the connection is closed"

is the missing part here in the conversation?
Generally, Error on a streaming RPC means: "oops, bad things, please restart your process"
so we should say that no error is sent, but that also no records are sent (because there are none).

Perhaps REALLY in the text about the startup the reply is something like:

"device returns as many records as will satisfy the request, or none if there are no records which satisfy the request. The connection remains open such that future records may be streamed to the client as they are available."

Or something along those lines, at any rate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a fair point, we COULD have the server send back a status: ok (this is nirajan's text mostly I think)
but really .... we connect, we send a 'hope you have mail for me!, but I'm hanging out here anyway for later.'
and that's ok. Whether we get 0, 1, tons of records on our first connect is a little unimportant.

There IS a 'did you hear me???' problem, but that might be solve either in the metrics/alerting pipeline ("that device is oddly silent on acctz.. whats up?") or with a simple: "if nothing heard in N mins, disconnect and reconnect"

that model (disco/reco) is ALSO problematic, because it doesn't really solve anything and never surfaces the 'that device is oddly silent...' to the alerting pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops... and.. nirajan, if you think there's wording fixes / etc please don't hesitate to send along :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ja, thanks for the explanation, morrow. Please suggest the text nirajan, or suggestion for the text to consider is here: haussli@33b831a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #151 for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

// The gNSI TCP connection could timeout due to expiration of the TCP
// keepalive mechanism, or the server could disconnect the client if the
// output buffer to the client remains full for a long timeout period,
// implying a stuck client (see also getsockopt(SIOCOUTQ) and
// https://datatracker.ietf.org/doc/draft-ietf-idr-bgp-sendholdtimer/).
//
rpc RecordSubscribe(stream RecordRequest) returns (stream RecordResponse);
}

Expand Down
Loading