Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in MongoDB access possible due to recent upgrade in mongoose #236

Open
fgalan opened this issue Mar 4, 2021 · 18 comments
Open

Errors in MongoDB access possible due to recent upgrade in mongoose #236

fgalan opened this issue Mar 4, 2021 · 18 comments

Comments

@fgalan
Copy link
Member

fgalan commented Mar 4, 2021

Comes from issue #234

The following error appears in latest:

iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager time=2021-03-03T16:44:11.094Z | lvl=DEBUG | corr=af12e7d2-2b1f-49ba-8943-51ee5459d40f | trans=af12e7d2-2b1f-49ba-8943-51ee5459d40f | op=IoTAgentNGSI.GenericMiddlewares | from=n/a | srv=n/a | subsrv=n/a | msg=Request for path [/iot/protocols/5ea0a2815e27f20010e5fe6b] from [localhost:8082] | comp=IoTAgentManager
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager time=2021-03-03T16:44:41.105Z | lvl=DEBUG | corr=af12e7d2-2b1f-49ba-8943-51ee5459d40f | trans=af12e7d2-2b1f-49ba-8943-51ee5459d40f | op=IoTAManager.ProtocolAPI | from=n/a | srv=n/a | subsrv=n/a | msg=Internal MongoDB Error getting device: MongooseServerSelectionError: Server selection timed out after 30000 ms
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at Function.Model.$wrapCallback (/opt/iotaManager/node_modules/mongoose/lib/model.js:4878:32)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at /opt/iotaManager/node_modules/mongoose/lib/query.js:4372:21
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at promiseOrCallback (/opt/iotaManager/node_modules/mongoose/lib/helpers/promiseOrCallback.js:9:12)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at model.Query.exec (/opt/iotaManager/node_modules/mongoose/lib/query.js:4371:10)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at model.Query.Query.deleteOne (/opt/iotaManager/node_modules/mongoose/lib/query.js:2714:13)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at Function.deleteOne (/opt/iotaManager/node_modules/mongoose/lib/model.js:1946:13)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at Function.removeProtocol (/opt/iotaManager/lib/services/protocolData.js:165:20)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at Object.remove (/opt/iotaManager/node_modules/iotagent-node-lib/lib/services/common/alarmManagement.js:116:25)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at deleteProtocol (/opt/iotaManager/lib/services/protocols.js:83:15)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at Layer.handle [as handle_request] (/opt/iotaManager/node_modules/express/lib/router/layer.js:95:5)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at next (/opt/iotaManager/node_modules/express/lib/router/route.js:137:13)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at Route.dispatch (/opt/iotaManager/node_modules/express/lib/router/route.js:112:3)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at Layer.handle [as handle_request] (/opt/iotaManager/node_modules/express/lib/router/layer.js:95:5)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at /opt/iotaManager/node_modules/express/lib/router/index.js:281:22
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at param (/opt/iotaManager/node_modules/express/lib/router/index.js:354:14)
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager     at param (/opt/iotaManager/node_modules/express/lib/router/index.js:365:14) {
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager   reason: [TopologyDescription]
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager } | comp=IoTAgentManager
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager time=2021-03-03T16:44:41.112Z | lvl=ERROR | corr=af12e7d2-2b1f-49ba-8943-51ee5459d40f | trans=af12e7d2-2b1f-49ba-8943-51ee5459d40f | op=IoTAgentNGSI.Alarms | from=n/a | srv=n/a | subsrv=n/a | msg=Raising [MONGO-ALARM]: {"name":"INTERNAL_DB_ERROR","message":"An internal DB Error happened: MongooseServerSelectionError: Server selection timed out after 30000 ms","code":500} | comp=IoTAgentManager
iot-agent-manager-iot-agent-manager-549547b78-2r9jr iot-agent-manager (node:1) [DEP0097] DeprecationWarning: Using a domain property in MakeCallback is deprecated. Use the async_context variant of MakeCallback or the AsyncResource class instead.

According to reports (tell me @modularTaco if I'm wrong, please) in version 1.13.0 it goes ok.

Comparing 1.13.0 with current master we see this:

- Upgrade mongodb dev dependency from 3.5.9 to 3.6.4
- Upgrade mongoose dependency from 5.7.7 to 5.11.18

(introduced by #231)

I think we can discard mongodb as cause of the problem (it is only used for tests), but maybe the upgrade in mongoose is causing it. Maybe we are using mongoose in our code in a way which is not fully compatible in 5.11 and some fix is needed.

@modularTaco
Copy link

From my quick recherche yesterday i think it could be related to

options.useUnifiedTopology = true
(#231 )

As of

setting useUnifiedTopology to true caused my issues

https://stackoverflow.com/a/61072072 (Automattic/mongoose#8180)

And yes, you're right. In 1.13.0 the same mongo replicaset worked fine, after a downgrade everything was good again

@fgalan
Copy link
Member Author

fgalan commented Mar 9, 2021

PR #239 supposedly solves this issue.

@modularTaco could you test it (once the new :latest container gets built, I recommend to wait some time yet as the merging of the branch into master is very recent) and tell us how it goes, please?

@modularTaco
Copy link

Tested the latest build (built three hours ago, DIGEST:sha256:cc4b573f8e888d9b778b9483af913ad6ee951bb8de4e8d0c6e2971f90f3ee37e)

Startup seems fine:

iot-agent-manager-iot-agent-manager-5959b687b5-p2mx9 iot-agent-manager time=2021-03-09T13:38:22.477Z | lvl=INFO | corr=890eef4d-a452-477c-ba89-8fe353941fc8 | trans=890eef4d-a452-477c-ba89-8fe353941fc8 | op=IoTAManager.NorthboundServer | from=n/a | srv=n/a | subsrv=n/a | msg=Starting IoT Agent Manager listening on port [8082] | comp=IoTAgentManager
iot-agent-manager-iot-agent-manager-5959b687b5-p2mx9 iot-agent-manager time=2021-03-09T13:38:22.661Z | lvl=INFO | corr=890eef4d-a452-477c-ba89-8fe353941fc8 | trans=890eef4d-a452-477c-ba89-8fe353941fc8 | op=IoTAManager.DbConn | from=n/a | srv=n/a | subsrv=n/a | msg=First connection attempt | comp=IoTAgentManager
iot-agent-manager-iot-agent-manager-5959b687b5-p2mx9 iot-agent-manager time=2021-03-09T13:38:22.662Z | lvl=INFO | corr=890eef4d-a452-477c-ba89-8fe353941fc8 | trans=890eef4d-a452-477c-ba89-8fe353941fc8 | op=IoTAManager.DbConn | from=n/a | srv=n/a | subsrv=n/a | msg=Attempting to connect to MongoDB instance. Attempt 1 | comp=IoTAgentManager
iot-agent-manager-iot-agent-manager-5959b687b5-p2mx9 iot-agent-manager the server/replset/mongos/db options are deprecated, all their options are supported at the top level of the options object [poolSize,ssl,sslValidate,sslCA,sslCert,sslKey,sslPass,sslCRL,autoReconnect,noDelay,keepAlive,keepAliveInitialDelay,connectTimeoutMS,family,socketTimeoutMS,reconnectTries,reconnectInterval,ha,haInterval,replicaSet,secondaryAcceptableLatencyMS,acceptableLatencyMS,connectWithNoPrimary,authSource,w,wtimeout,j,writeConcern,forceServerObjectId,serializeFunctions,ignoreUndefined,raw,bufferMaxEntries,readPreference,pkFactory,promiseLibrary,readConcern,maxStalenessSeconds,loggerLevel,logger,promoteValues,promoteBuffers,promoteLongs,domainsEnabled,checkServerIdentity,validateOptions,appname,auth,user,password,authMechanism,compression,fsync,readPreferenceTags,numberOfRetries,auto_reconnect,minSize,monitorCommands,retryWrites,retryReads,useNewUrlParser,useUnifiedTopology,serverSelectionTimeoutMS,useRecoveryToken,autoEncryption,driverInfo,tls,tlsInsecure,tlsinsecure,tlsAllowInvalidCertificates,tlsAllowInvalidHostnames,tlsCAFile,tlsCertificateFile,tlsCertificateKeyFile,tlsCertificateKeyFilePassword,minHeartbeatFrequencyMS,heartbeatFrequencyMS,directConnection,appName,maxPoolSize,minPoolSize,maxIdleTimeMS,waitQueueTimeoutMS]
iot-agent-manager-iot-agent-manager-5959b687b5-p2mx9 iot-agent-manager time=2021-03-09T13:38:22.867Z | lvl=INFO | corr=890eef4d-a452-477c-ba89-8fe353941fc8 | trans=890eef4d-a452-477c-ba89-8fe353941fc8 | op=IoTAManager.DbConn | from=n/a | srv=n/a | subsrv=n/a | msg=Successfully connected to MongoDB. | comp=IoTAgentManager
iot-agent-manager-iot-agent-manager-5959b687b5-p2mx9 iot-agent-manager time=2021-03-09T13:38:22.880Z | lvl=INFO | corr=890eef4d-a452-477c-ba89-8fe353941fc8 | trans=890eef4d-a452-477c-ba89-8fe353941fc8 | op=IoTAManager.Executable | from=n/a | srv=n/a | subsrv=n/a | msg=IoT Agent Manager started | comp=IoTAgentManager

Querying looks also fine:

➜ curl localhost:8082/iot/protocols
{"count":1,"protocols":[{"_id":"5f6b345a5fb3a6000f5d13de","description":"LORAWAN IoT Agent, supports TTN and Loraserver.io","protocol":"LORAWAN","resource":"/iot/lorawan","iotagent":"http://lorawan-iot-agent:4041/config/iot","__v":0}]}%   

But sometimes operations do not work:

➜ curl localhost:8082/iot/protocols
{"name":"MongoError","message":"not master and slaveOk=false"}
iot-agent-manager-iot-agent-manager-5959b687b5-p2mx9 iot-agent-manager time=2021-03-09T13:39:23.532Z | lvl=ERROR | corr=890eef4d-a452-477c-ba89-8fe353941fc8 | trans=890eef4d-a452-477c-ba89-8fe353941fc8 | op=IoTAManager.ProtocolAPI | from=n/a | srv=n/a | subsrv=n/a | msg=error: {"topologyVersion":{"processId":"603e1b6376496ef551eb5857","counter":20},"operationTime":"6937648471226712065","ok":0,"code":13435,"codeName":"NotPrimaryNoSecondaryOk","$clusterTime":{"clusterTime":"6937648471226712065","signature":{"hash":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","keyId":0}},"name":"MongoError"} | comp=IoTAgentManager

With 1.13.0 this hasn't happened, but with latest it happened in 1 of 10 tries (recreated the kubernetes deployment 10 times). So maybe this happens also in 1.13.0, but as i have no idea of how to trigger this behaviour, i cannot proof it.
The mongodb configuration is always the same (pointing to a headless kubernetes service with three pods)

@fgalan
Copy link
Member Author

fgalan commented Mar 9, 2021

When you are in a fail situation, i.e. you get:

{"name":"MongoError","message":"not master and slaveOk=false"}

Can you check the status of the replica set? Maybe due to some network error or whatever, the replica set is changing the primary from time to time so causing some unstability time until a new primary is elected.

@fgalan
Copy link
Member Author

fgalan commented Mar 9, 2021

In fact, using some "probe" to test if the replica set is stable along the time, independently of the IOTA operation, would be a good idea to check that everything is working fine in the DB layer.

@modularTaco
Copy link

Can you check the status of the replica set? Maybe due to some network error or whatever, the replica set is changing the primary from time to time so causing some unstability time until a new primary is elected.

I checked it in the monitoring, the last leader election was 10 days ago

@fgalan
Copy link
Member Author

fgalan commented Mar 10, 2021

We have recently introduce a log improvements (PR #240) that could bring more information into this issue.

Could you update :latest and tell us which trace like this one (i.e. with Attempting to connect to MongoDB as part of the msg field) do you have in your case, please?

time=2021-03-10T11:15:48.494Z | lvl=INFO | corr=8e288dbf-32dc-4d72-b02a-c9d5accb0651 | trans=8e288dbf-32dc-4d72-b02a-c9d5accb0651 | op=IoTAManager.DbConn | from=n/a | srv=n/a | subsrv=n/a | msg=Attempting to connect to MongoDB instance with url "mongodb://iot-mongo:27017/iotagent-manager" and options {}. Attempt 1 | comp=IoTAgentManager

@modularTaco
Copy link

modularTaco commented Mar 12, 2021

iot-agent-manager-iot-agent-manager-84cc468d4c-l2qtg iot-agent-manager time=2021-03-12T08:52:36.869Z | lvl=INFO | corr=dc8c53bf-ec92-4b2a-a0cd-a836523c3a5a | trans=dc8c53bf-ec92-4b2a-a0cd-a836523c3a5a | op=IoTAManager.DbConn | from=n/a | srv=n/a | subsrv=n/a | msg=Attempting to connect to MongoDB instance with url "mongodb://mongodb-headless.platform.svc.cluster.local:27017/iot-agent-manager" and options {"replset":{"rs_name":"rs"}}. Attempt 1 | comp=IoTAgentManager
bash-5.1# dig srv mongodb-headless.platform.svc.cluster.local +short
0 33 27017 mongodb-0.mongodb-headless.platform.svc.cluster.local.
0 33 27017 mongodb-2.mongodb-headless.platform.svc.cluster.local.
0 33 27017 mongodb-1.mongodb-headless.platform.svc.cluster.local.

@AlvaroVega
Copy link
Member

What about connect to url mongodb://mongodb-0.mongodb-headless.platform.svc.cluster.local:27017,mongodb-1.mongodb-headless.platform.svc.cluster.local:27017,mongodb-2.mongodb-headless.platform.svc.cluster.local:27017/iot-agent-manager ?

@modularTaco
Copy link

What about connect to url mongodb://mongodb-0.mongodb-headless.platform.svc.cluster.local:27017,mongodb-1.mongodb-headless.platform.svc.cluster.local:27017,mongodb-2.mongodb-headless.platform.svc.cluster.local:27017/iot-agent-manager ?

I can try it. But as it works most of the time with the headless-service url and i do not know when and why it doesn't work, i cannot really test if it works better when specifying each server separately in the url

@fgalan
Copy link
Member Author

fgalan commented Mar 15, 2021

Using just one host in the connection string to replica set is not recommended (see for instance this explanation: https://stackoverflow.com/questions/23958759/mongodb-connection-string-to-replica-set). It may be causing the connection unsuitability you are experiencing here.

Thus, please follow the @AlvaroVega advice to include the three hosts in the connection string, then try again and tell us how it goes.

Thanks!

@modularTaco
Copy link

Using just one host in the connection string to replica set is not recommended

I thought that this doesn't apply to our env, as we're using mongodb://mongodb-headless.platform.svc.cluster.local:27017/iot-agent-manager where mongodb-headless.platform.svc.cluster.local is not a single host.
It contains one srv record for each mongodb node.

But your StackOverflow link gave me a hint i was not aware of: mongodb is able to use a dns seed list, but not with mongodb:// as connection string prefix. Instead it should be mongodb+srv://. So in our case: mongodb+srv://mongodb-headless.platform.svc.cluster.local:27017/iot-agent-manager. (according to https://docs.mongodb.com/manual/reference/connection-string/#connections-dns-seedlist)

So when i just use the mongodb:// prefix, it will use the normal A record and not the SRV ones.
In case of a headless kubernetes service with multiple endpoints, this might end up in a dns round robin:

bash-5.1# dig mongodb-headless.platform.svc.cluster.local +short
172.25.0.109
172.25.5.158
172.25.4.135
bash-5.1# dig mongodb-headless.platform.svc.cluster.local +short
172.25.5.158
172.25.4.135
172.25.0.109
bash-5.1# dig mongodb-headless.platform.svc.cluster.local +short
172.25.0.109
172.25.5.158
172.25.4.135
bash-5.1# dig mongodb-headless.platform.svc.cluster.local +short
172.25.4.135
172.25.0.109
172.25.5.158
bash-5.1# dig mongodb-headless.platform.svc.cluster.local +short
172.25.5.158
172.25.4.135
172.25.0.109
bash-5.1# dig mongodb-headless.platform.svc.cluster.local +short
172.25.5.158
172.25.4.135
172.25.0.109

This could be also the reason why i was not really able to reproduce the behaviour and it just worked, when i recreated the pod (without changing the configuration).

I'll try the mongodb+srv format as well as specifying each server in the connection uri.

@fgalan
Copy link
Member Author

fgalan commented Mar 15, 2021

I'll try the mongodb+srv format as well as specifying each server in the connection uri.

I didn't realize you are using SRV records for this :) Thus, probably "mongodb+srv://..." would help.

@modularTaco
Copy link

I'll try the mongodb+srv format as well as specifying each server in the connection uri.

I didn't realize you are using SRV records for this :) Thus, probably "mongodb+srv://..." would help.

Actually i can't use mongodb+srv://.. with IOTAM or IOTAs (or i misread something)

IOTAM and IOTAs are building the connection string on theirselves with mongodb:// hard coded:

IOTAM:

url = 'mongodb://' + credentials + hosts + '/' + db;

IOTA:
https://github.com/telefonicaid/iotagent-node-lib/blob/master/lib/model/dbConn.js#L77

Should i create an extra feature request for this?

@fgalan
Copy link
Member Author

fgalan commented Mar 23, 2021

Should i create an extra feature request for this?

Before that, it would be good to be sure if mongodb+srv://... solves the problem. To that end, I'd suggest you modify your local instance of IOTA (a temporal hack) to hardwire the connection string you need and tell us if worked or not, please.

@modularTaco
Copy link

Should i create an extra feature request for this?

Before that, it would be good to be sure if mongodb+srv://... solves the problem. To that end, I'd suggest you modify your local instance of IOTA (a temporal hack) to hardwire the connection string you need and tell us if worked or not, please.

i'll try. I need create my own docker image for the IOTAM with my connection uri hardcoded to use it in our Kubernetes Infra and test it. I'm not sure wether i can do it this week

@fgalan
Copy link
Member Author

fgalan commented Mar 23, 2021

i'll try. I need create my own docker image for the IOTAM with my connection uri hardcoded to use it in our Kubernetes Infra and test it. I'm not sure wether i can do it this week

An alternative would be to access inside the container (docker exec bash or something like that), modify the involved .js file, then restart the container to get the IOTA restarted with the modifications. But now sure if this is better or not... :)

@modularTaco
Copy link

i'll try. I need create my own docker image for the IOTAM with my connection uri hardcoded to use it in our Kubernetes Infra and test it. I'm not sure wether i can do it this week

An alternative would be to access inside the container (docker exec bash or something like that), modify the involved .js file, then restart the container to get the IOTA restarted with the modifications. But now sure if this is better or not... :)

In plain Docker I would agree, just change something in the container and restart the container.

But as far as I know this is not possible in Kubernetes. Restarting a Kubernetes Pod means: The current one gets destroyed and a new one gets build -> all manual changes in the old container (except volumes) are lost.

@AlvaroVega AlvaroVega unpinned this issue Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants