-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After few days connection stops working on Android #405
Comments
Thanks for taking the time to open this issue. We don't currently support Android with this sdk, but it is something that we have planned. That being said it would still be useful if you could list what Android devices that you see this on and which versions of Android. Also a minimal code sample that reproduces this would be helpful. |
It's speculation, but the behavior you describe is possible under certain scenarios. I'm copy-pasting a response I gave to an internal query regarding similar behavior. In their case they were publishing a packet that exceeded IoT Core payload size limits, which results in an immediate disconnection. You may not be breaking that limit, but on the chance that you are breaking any packet-based limit that results in a disconnect, this may apply: The MQTT spec requires a client to retry QoS1+ publishes that were sent but not acknowledged upon resuming a session. (MQTT311) Messages that exceed IoT Core limits cause an immediate connection termination. When you combine these two facts together, you get a "death loop." An "illegal" publish gets tried over and over and the connection gets terminated over and over. There are number of ways to avoid this, some better than others.
|
Thanks for the suggestions, this is good to keep in mind since I think parts of our code were not designed for this, I would have expected that (MqttClientConnection.publish(msg)) would throw an exception immediately but it does not so we have to be careful about the payload limit. I did test this edge case that you mentioned and it can definitely lead to re-connection loop but without the 90 second delays that we are seeing. connection loop would start happening. Here Cloudwatch logs give us a clear reason why the error is happening so this is definitely not what we encountered with the multiple days test: For the multiple days test we have a single: |
Just to follow up (stepping in for @IgorCordasProGlove), we are still seeing this error. We have been able to build a "minimal" version to reproduce this behavior that is attached here, as well as the logs for one of the app runs. android_minimal_integration_awsiotcore-main.zip Conclusion with the test application from above was that there might be some client-side issue within the SDK which leads to this re-connection loop being triggered, we assume this because logs with this minimal app from Cloudwatch don't mention any errors. If you want to ran the sample app we provided you will need to go into the AwsService.kt and add some certificate data instead of those TODOs in AwsCredentials object. Whan started app will do the fleet provisioning with your environment and then just repeatedly ping the : |
hi @IgorCordasProGlove , I am Hiep. I am using this aws lib to connect to aws mqtt. I believe it is because I turn on shrinkResources and minifyEnabled when building release. Does your app got issue when you build release with shrinkResources and minifyEnabled ? Thank you, |
@ngbahiep You can use the debug version of the app, that is how we got it reproduced. |
Describe the bug
We have an app which runs as a foreground service on Android and should keep persistent connection to IotCore for multiple days/weeks.
Connection is provisioned using fleet provisioning.
We are not sending much data over this connection but it is expected that a standard use case would send 100-1000 messages each day per device.
Above is mostly working but on some devices after a few days (usually 3) we start receiving strange errors and device does not connect properly. This is part of the log that we have to detect this issue:
[2023.03.30 19:20:24.082] I: Publishing to topic <custom_topic>
[2023.03.30 19:21:54.519] I: Cloud connection interrupted: AWS_ERROR_MQTT_UNEXPECTED_HANGUP - The connection was closed unexpectedly.
[2023.03.30 19:21:54.520] I: New cloud connection state: REGISTERED_NOT_CONNECTED
[2023.03.30 19:21:55.887] I: Cloud connection reestablished. Resumed session: true
[2023.03.30 19:21:55.887] I: New cloud connection state: REGISTERED_CONNECTED
The strange thing here is not that connection broke but that from this point in time we start receiving : AWS_ERROR_MQTT_UNEXPECTED_HANGUP in a loop repeating each 90 seconds until we restart the app process.
When connection is in this state we can publish some of the messages but our subscriptions stop working.
We investigated Cloudwatch logs for IOT Core and there is a single error before this connection error loop starts happening:
DUPLICATE_CLIENTID
as per documentation : "The client is using a client ID that is already in use. In this case, the client that is already connected will be disconnected with this disconnect reason."
But the issue here is that we are not using duplicate id anywhere, all the communication for that client id is from a single IP as the logs state and since the process is running constantly it is using a single connection.
After that error we end up in that 90 second re-connection loop on the client.
Above does not happen every time, some devices run for 5+ days without this issue but is reproducible quite often.
Doze mode or battery optimizations seem to have no effect on this, some of the users exclude our app from battery optimization and the issue still starts happening.
Expected Behavior
Connection should be able to run continually (with reconnects) and no 90 second disconnect/re-connect loops should be observed.
Current Behavior
Connection gets into half-broken state and disconnects/re-connects loop indefinitely
Reproduction Steps
We are not able to provide this at current point due to a fact that issue reproduces only on some of the devices
Possible Solution
No response
Additional Information/Context
No response
SDK version used
software.amazon.awssdk.iotdevicesdk:aws-iot-device-sdk-android:1.11.3, software.amazon.awssdk.crt:aws-crt-android:0.16.6
Environment details (OS name and version, etc.)
Android (multiple versions experience the issue)
The text was updated successfully, but these errors were encountered: