Error handling for broken connections #129

Linuus · 2024-04-11T12:36:13Z

Hi!

We're trying out your gem to send VoIP notifications, using Sidekiq. We are having some issues though with broken connections.

At first we were raising an error in the connection.on(:error) {} callback, like this:

      Apnotic::ConnectionPool.new(connection_config, size: 5) do |connection|
        connection.on(:error) do |exception|
          raise(PushNotification::Error, "Production APNs connection error: #{exception}")
        end
      end

That was a really bad idea since it crashed all of Sidekiq making it restart. We fixed this and now we're just reporting to our error service instead.

      Apnotic::ConnectionPool.new(connection_config, size: 5) do |connection|
        connection.on(:error) do |exception|
          Sentry.capture_exception(exception)
        end
      end

Now, occasionally we get this error reported:

Errno::ECONNRESET: Connection reset by peer
  from openssl (3.2.0) lib/openssl/buffering.rb:211:in `sysread_nonblock'
  from openssl (3.2.0) lib/openssl/buffering.rb:211:in `read_nonblock'
  from net-http2 (0.18.5) lib/net-http2/client.rb:145:in `block in socket_loop'
  from net-http2 (0.18.5) lib/net-http2/client.rb:142:in `loop'
  from net-http2 (0.18.5) lib/net-http2/client.rb:142:in `socket_loop'
  from net-http2 (0.18.5) lib/net-http2/client.rb:114:in `block (2 levels) in ensure_open'

It's reported in the callback and then 60s later we get a timeout here:

    connection_pool(ios_voip_push_token).with do |connection|
      response = connection.push(apnotic_notification(notification, ios_voip_push_token))
      raise(TimeoutError) if response.nil?
      [...]
    end

I guess we can pass a shorter timeout to the push method to lower this timeout, since it seems fairly high.

Anyway, when this happened it started happening a lot. Almost all our pushes got this connection reset error. Our push jobs are not retried, but I don't think this would help either since the connections seems to not be "healed".

Could there be an issue where connections are stuck in a broken state? Or are we supposed to handle these errors differently?

The text was updated successfully, but these errors were encountered:

avanrielly · 2024-11-05T22:08:29Z

This happens for us a lot too. We are using push notifications to send print messages to iPads and the connection periodically is reset and we end up having to wait the timeout for it to fail before we can try it again. This is causing a poor experience for our clients.

Is there a way to guarantee the connection is still open before trying to send a new push notification?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling for broken connections #129

Error handling for broken connections #129

Linuus commented Apr 11, 2024

avanrielly commented Nov 5, 2024

Error handling for broken connections #129

Error handling for broken connections #129

Comments

Linuus commented Apr 11, 2024

avanrielly commented Nov 5, 2024