-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RSDK-8292] Create Single App Connection and Use for Net Appender and Restart Checker #4746
Conversation
) | ||
|
||
type AppConn struct { | ||
ReconfigurableClientConn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that an AppConn
instance attempts to establish a connection in a background Goroutine, all calls to its methods would require a nil check and the acquisition of a lock. This code/functionality is already written out in the ReconfigurableClientConn
, so I figured it'd be best to just reuse that.
grpc/app_conn.go
Outdated
continue | ||
} | ||
|
||
appConn.Err = err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I'm handling errors returned by the background dials by simply storing them in the AppConn
struct and stopping the repeating dial attempts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my other question around this, I'm not sure if we need this field (yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually we'd want to expose whether the connection is connected or not, but that can come later (the connection being not nil does not tell us whether it is disconnected or not)
grpc/app_conn.go
Outdated
for { | ||
appConn.connMu.Lock() | ||
|
||
ctxWithTimeOut, ctxWithTimeOutCancel := context.WithTimeout(context.Background(), 5*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use the context.Context passed to NewAppConn()
when creating this context.WithTimeout
because the one passed as a function parameter has its own timeout and is cancelled when this function returns (during which this Goroutine might still be running).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the context passed in would be good because then a context cancellation stemming from killing the viam-server will propagate properly
web/server/entrypoint.go
Outdated
@@ -191,13 +192,17 @@ func RunServer(ctx context.Context, args []string, _ logging.Logger) (err error) | |||
// Start remote logging with config from disk. | |||
// This is to ensure we make our best effort to write logs for failures loading the remote config. | |||
if cfgFromDisk.Cloud != nil && (cfgFromDisk.Cloud.LogPath != "" || cfgFromDisk.Cloud.AppAddress != "") { | |||
ctxWithTimeout, ctxWithTimeoutCancel := config.GetTimeoutCtx(ctx, true, cfgFromDisk.Cloud.ID) | |||
appConn, err := grpc.NewAppConn(ctxWithTimeout, cfgFromDisk.Cloud, logger) // TODO(RSDK-8292): [q] what logger should I pass here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[qu]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a sublogger, possibly networking.app_connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
grpc/app_conn.go
Outdated
} | ||
}) | ||
|
||
// if initial dial attempt fails due to time out, return nil error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment here
logging/net_appender.go
Outdated
@@ -406,6 +406,7 @@ func (w *remoteLogWriterGRPC) write(ctx context.Context, logs []*commonpb.LogEnt | |||
return nil | |||
} | |||
|
|||
// TODO(RSDK-8292): [qu] do we need this anymore? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment since this isn't a question anymore
@cheukt re: our convo offline. I tested with a config that doesn't have a "cloud" component, and it seems to work fine. The RDK continue to run as expected. While I was at it, I also attempted including cloud information but leaving out fields and having invalid values, also seems to be good. No cloud config: RDK runs as normal |
This PR accomplishes two main things:
rpc.ClientConn
interface that we'll use as our one global connection to app in unifying the many that already exist.NetAppender
andRestartChecker
.The global connection works as follows. It's first instantiated - and thus made available to the entire RDK server - when the NetAppender is created in web/server/entrypoint.go. This could change in coming PR's as more connections are roped into this global one. A more important focal point is of this PR is how the connection is created [and managed]. When the
grpc.NewAppConn()
is called, a dial attempt to App is made. This initial attempt blocks for some time. If not successful, a Goroutine will spin in off in the background that'll repeatedly attempt to dial App. These attempts are serialized, and each times out after 5 seconds. Once a successful connection is created, the Goroutine is done.Testing
NetAppender
Offline startup |
NetAppender
fails logging to network because it's "not connected."Offline startup -> connection |
NetAppender
fails logging to network at first but is then able to, signifying background Goroutine continually attempting to dial App works.Offline startup -> connection -> disconnection -> reconnection |
NetAppender
fails logging to network at first but is then able to. After disconnecting from the internet,NetAppender
logs time out. Logs succeed when an internet connection returns.Online startup -> disconnection -> connection |
NetAppender
is able to log at first. Once WiFi is cut out, logs time out but succeed after it returns.RestartChecker
Same as above. Signal to restart was received after multiple disconnections/reconnections.