Fetching a new cluster composition can sometimes result in NoMethodError #220

danielmconrad · 2022-11-28T15:37:29Z

After a period of moderate inactivity, we began seeing this error on all subsequent requests. Restarting our instances temporarily fixed the issue. From the look of things, the key used in address_to_pool for Neo4j::Driver::Internal::Async::Pool::ConnectionPoolImpl is somehow replaced with a Set instead of an address.

Docker image: ruby:3.1.2

Gems:

gem 'activegraph', '11.1.0.alpha.4'
gem 'neo4j-ruby-driver', '4.4.0'

Error:

NoMethodError:
undefined method `attributes' for #<Set: {#<Neo4j::Driver::Internal::BoltServerAddress:0x00007ff5f7337940 @host="XXXXXXXX.databases.neo4j.io", @connection_host="XX.XXX.XXX.XXX", @port=7687>}>

        attributes == other&.attributes
                           ^^^^^^^^^^^^

File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/lib/neo4j/driver/internal/bolt_server_address.rb:74 in ==
File /usr/local/bundle/gems/set-1.0.3/lib/set.rb:404 in include?
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/async/pool/connection_pool_impl.rb:34 in block (2 levels) in retain_all
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/async/pool/connection_pool_impl.rb:33 in each
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/async/pool/connection_pool_impl.rb:33 in block in retain_all
File /usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb:147 in with_write_lock
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/async/pool/connection_pool_impl.rb:32 in retain_all
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/cluster/routing_table_handler_impl.rb:69 in fresh_cluster_composition_fetched
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/cluster/routing_table_handler_impl.rb:35 in block in ensure_routing_table
File /usr/local/bundle/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb:147 in with_write_lock
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/cluster/routing_table_handler_impl.rb:30 in ensure_routing_table
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/cluster/routing_table_registry_impl.rb:19 in block in ensure_routing_table
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/cluster/routing_table_registry_impl.rb:19 in ensure_routing_table
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/cluster/loadbalancing/load_balancer.rb:24 in acquire_connection
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/async/network_session.rb:95 in acquire_connection
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/async/network_session.rb:28 in begin_transaction_async
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/internal_session.rb:66 in private_begin_transaction
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/internal_session.rb:56 in block in transaction
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/retry/exponential_backoff_retry_logic.rb:23 in retry
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/internal_session.rb:55 in transaction
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/lib/neo4j/driver/synchronizable.rb:16 in block (4 levels) in with_sync_wrapper
File /usr/local/bundle/gems/async-2.2.1/lib/kernel/sync.rb:20 in Sync
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/lib/neo4j/driver/synchronizable.rb:16 in block (3 levels) in with_sync_wrapper
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/ruby/neo4j/driver/internal/internal_session.rb:41 in read_transaction
File /usr/local/bundle/gems/activegraph-11.1.0.alpha.4/lib/active_graph/transactions.rb:43 in run_transaction_work
File /usr/local/bundle/gems/activegraph-11.1.0.alpha.4/lib/active_graph/transactions.rb:37 in block in send_transaction
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/lib/neo4j/driver/auto_closable.rb:19 in block (3 levels) in with_block_definer
File /usr/local/bundle/gems/neo4j-ruby-driver-4.4.0/lib/neo4j/driver/synchronizable.rb:16 in block (4 levels) in with_sync_wrapper
File /usr/local/bundle/gems/async-2.2.1/lib/async/task.rb:107 in block in run
File /usr/local/bundle/gems/async-2.2.1/lib/async/task.rb:243 in block in schedule

The text was updated successfully, but these errors were encountered:

wmene · 2023-09-06T16:14:30Z

We are still seeing this issue with neo4j-ruby-driver 4.4.4

Lianowar · 2024-02-12T08:26:59Z

Is there is any news about that issue?
Still see it at 4.4.5 and sometimes there is another one error undefined method map for nil at driver/internal/cluster/routing_table_handler_impl.rb

L33tH4x0r · 2024-06-06T18:38:28Z

We're seeing this issue as well. It pops up intermittently, but would like to have it run clean. Is there any update on this?

L33tH4x0r · 2024-06-06T20:40:28Z

It looks like the issue might be here:

neo4j-ruby-driver/lib/neo4j/driver/internal/bolt_server_address.rb

Line 89 in 610e611

Set[self]

The BoltServerAddress is putting itself into a set. When it gets called later it no longer has the attributes method since it's not a BoltServerAddress anymore

wmene · 2024-06-07T00:48:53Z

@L33tH4x0r , but if you follow the stack trace above, the Sets enclosing the BoltServerAddress get reduced into one Set here
https://github.com/neo4jrb/neo4j-ruby-driver/blob/4.4/ruby/neo4j/driver/internal/cluster/routing_table_handler_impl.rb#L63

addresses_to_retain = @routing_table_registry.all_servers.map(&:unicast_stream).reduce(&:+)

When things are in order, the include? method works because it's operating on BoltServerAddress. Somehow, this BoltServerAddress is getting double wrapped by Set somewhere

wmene · 2024-06-07T01:39:54Z

Actually, I think it gets double wrapped by Set here:
https://github.com/neo4jrb/neo4j-ruby-driver/blob/4.4/ruby/neo4j/driver/internal/cluster/rediscovery_impl.rb#L94

def lookup_on_initial_router(routing_table, connection_pool, seen_servers, bookmark, impersonated_user, base_error)
          resolved_routers = resolve
          (resolved_routers - seen_servers.to_a).lazy.filter_map do |address|
            lookup_on_router(address, false, routing_table, connection_pool, nil, bookmark,
                             impersonated_user, base_error)
          end.first&.then { |composition| ClusterCompositionLookupResult.new(composition, Set.new(resolved_routers)) }
end

resolved_routers is a an Array of Set wrapped BoltServerAddresses, so this method returns one Set of individually wrapped BoltServerAddresses

When the routing table gets refreshed in RoutingTableHandlerImpl#fresh_cluster_composition_fetched, it combines the addresses_retain with this Set of individually wrapped BoltServerAddresses here
https://github.com/neo4jrb/neo4j-ruby-driver/blob/4.4/ruby/neo4j/driver/internal/cluster/routing_table_handler_impl.rb#L66

         composition_lookup_result.resolved_initial_routers&.then do |addresses|
            addresses_to_retain << addresses
          end

          @connection_pool.retain_all(addresses_to_retain)

The call to retain_all then tries to walk through addresses_to_retain, where it encounters the Set wrapped BoltServerAddresses

@klobuczek , can we get rid of Set enclosing resolved_routers on this line?

neo4j-ruby-driver/ruby/neo4j/driver/internal/cluster/rediscovery_impl.rb

Line 94 in 610e611

    
           end.first&.then { |composition| ClusterCompositionLookupResult.new(composition, Set.new(resolved_routers)) }

L33tH4x0r · 2024-06-07T15:28:32Z

It looks like someone added a flatten to the addresses_to_retain method that is causing the issue here in this commit in a fork:

ascent-technologies@6f80bfa

wmene · 2024-06-07T15:34:07Z

It looks like someone added a flatten to the addresses_to_retain method that is causing the issue here in this commit in a fork:

ascent-technologies@6f80bfa

That would work too

danielmconrad · 2024-06-07T17:23:05Z

That change to the ascent-technologies fork is from us. Frankly, I forgot we made it. We've been running that change in production now for quite some time without seeing the routing table error.

L33tH4x0r · 2024-06-07T18:15:40Z

Thanks for the fork! This looks like it'll fix things.

Our team is just getting started using Neo4J and am curious as to the general problem here. Are you running some sort of configuration that caused more stale objects to get rehydrated that caused this issue? I get why this would fix it, but more wondering about underlying causes here

danielmconrad · 2024-06-10T16:23:12Z

I don't think there's anything special about our configuration. We're using ActiveGraph as the ORM to back an API powered by GraphQL and using AuraDB as the storage mechanism. The access of the API is pretty bursty. From what I can remember, there wasn't any particular pattern behind when/why we would see this issue.

L33tH4x0r · 2024-06-18T15:51:08Z

I tried to write a PR for this issue that uses ascent-technologies@6f80bfa but I was unable to push my branch to the remote. I was running into authentication issues so maybe the repo isn't open to PRs?

Either way, it would be appreciated if @klobuczek could take a look at the fork and push a change that fixes this. It would be greatly appreciated!

petergebala · 2024-08-29T06:44:59Z

@klobuczek is there any plan to include this fix into main repo?

petergebala mentioned this issue Oct 10, 2024

Fix issues with nested Sets #233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetching a new cluster composition can sometimes result in NoMethodError #220

Fetching a new cluster composition can sometimes result in NoMethodError #220

danielmconrad commented Nov 28, 2022

wmene commented Sep 6, 2023

Lianowar commented Feb 12, 2024 •

edited

Loading

L33tH4x0r commented Jun 6, 2024

L33tH4x0r commented Jun 6, 2024

wmene commented Jun 7, 2024

wmene commented Jun 7, 2024 •

edited

Loading

L33tH4x0r commented Jun 7, 2024

wmene commented Jun 7, 2024

danielmconrad commented Jun 7, 2024

L33tH4x0r commented Jun 7, 2024

danielmconrad commented Jun 10, 2024

L33tH4x0r commented Jun 18, 2024

petergebala commented Aug 29, 2024

Fetching a new cluster composition can sometimes result in NoMethodError #220

Fetching a new cluster composition can sometimes result in NoMethodError #220

Comments

danielmconrad commented Nov 28, 2022

wmene commented Sep 6, 2023

Lianowar commented Feb 12, 2024 • edited Loading

L33tH4x0r commented Jun 6, 2024

L33tH4x0r commented Jun 6, 2024

wmene commented Jun 7, 2024

wmene commented Jun 7, 2024 • edited Loading

L33tH4x0r commented Jun 7, 2024

wmene commented Jun 7, 2024

danielmconrad commented Jun 7, 2024

L33tH4x0r commented Jun 7, 2024

danielmconrad commented Jun 10, 2024

L33tH4x0r commented Jun 18, 2024

petergebala commented Aug 29, 2024

Lianowar commented Feb 12, 2024 •

edited

Loading

wmene commented Jun 7, 2024 •

edited

Loading