Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError leads to NullPointerException disabling redis connections #3087

Closed
stevenschlansker opened this issue Dec 17, 2024 · 2 comments · Fixed by #3115
Closed

OutOfMemoryError leads to NullPointerException disabling redis connections #3087

stevenschlansker opened this issue Dec 17, 2024 · 2 comments · Fixed by #3115

Comments

@stevenschlansker
Copy link

Bug Report

We had a sudden workload spike lead to an increase of usage of direct memory buffers. The process hit the direct memory limit, and the jvm throws OutOfMemoryError[1]. Soon afterward, lettuce hits a NullPointerException in the CommandHandler [2] trying to check the ref count on a null buffer.

Current Behavior

Lettuce hits NullPointerException, and then connection checkouts fail with this same cause going forward, leading to an application hang

Stack trace
[1]
java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct buffer memory (allocated: 804026406, limit: 805306368)
	at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:111)
	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:363)
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
	at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:693)
	at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:213)
	at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:195)
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:137)
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1073)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1081)
	at io.lettuce.core.codec.ByteArrayCodec.encodeKey(ByteArrayCodec.java:43)
	at io.lettuce.core.codec.ByteArrayCodec.encodeKey(ByteArrayCodec.java:33)
	at io.lettuce.core.protocol.CommandArgs.encode(CommandArgs.java:744)
	at io.lettuce.core.protocol.CommandArgs$KeyArgument.encode(CommandArgs.java:688)
	at io.lettuce.core.protocol.CommandArgs.encode(CommandArgs.java:367)
	at io.lettuce.core.protocol.Command.encode(Command.java:130)
	at io.lettuce.core.protocol.AsyncCommand.encode(AsyncCommand.java:189)
	at io.lettuce.core.protocol.CommandEncoder.encode(CommandEncoder.java:78)
	at io.lettuce.core.protocol.CommandEncoder.encode(CommandEncoder.java:63)
	at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:107)

[2]
io.lettuce.core.RedisConnectionException: Unable to connect to wscache/<unresolved>:6379
	at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:63)
	at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:41)
	at io.lettuce.core.RedisClient.lambda$transformAsyncConnectionException$22(RedisClient.java:783)
	at io.lettuce.core.DefaultConnectionFuture.lambda$thenCompose$1(DefaultConnectionFuture.java:238)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:907)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:885)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:554)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2238)
	at io.lettuce.core.AbstractRedisClient.lambda$null$5(AbstractRedisClient.java:482)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:907)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:885)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:554)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2238)
	at io.lettuce.core.protocol.RedisHandshakeHandler.lambda$fail$4(RedisHandshakeHandler.java:124)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:185)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
	at io.lettuce.core.protocol.RedisHandshakeHandler.fail(RedisHandshakeHandler.java:123)
	at io.lettuce.core.protocol.RedisHandshakeHandler.lambda$channelActive$3(RedisHandshakeHandler.java:92)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:907)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:885)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:554)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2238)
	at io.lettuce.core.RedisHandshake.lambda$tryHandshakeResp3$2(RedisHandshake.java:134)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:907)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:885)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:554)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2238)
	at io.lettuce.core.protocol.AsyncCommand.doCompleteExceptionally(AsyncCommand.java:143)
	at io.lettuce.core.protocol.AsyncCommand.completeExceptionally(AsyncCommand.java:136)
	at io.lettuce.core.protocol.CommandHandler.exceptionCaught(CommandHandler.java:284)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:447)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:1575)
Caused by: java.util.concurrent.CompletionException: java.lang.NullPointerException: Cannot invoke "io.netty.buffer.ByteBuf.refCnt()" because "this.buffer" is null
	at java.base/java.util.concurrent.CompletableFuture.wrapInCompletionException(CompletableFuture.java:323)
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:376)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:391)
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1185)
	... 27 common frames omitted
Caused by: java.lang.NullPointerException: Cannot invoke "io.netty.buffer.ByteBuf.refCnt()" because "this.buffer" is null
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:597)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	... 15 common frames omitted

Expected behavior

Being unable to allocate memory is not necessarily a recoverable situation, but this NPE seems like it is failing harder than it needs to. Ideally some operations would fail, reducing memory pressure, and allowing future forward progress.

Environment

  • Lettuce 6.5.1.RELEASE
  • Netty 4.1.115.Final
@tishun
Copy link
Collaborator

tishun commented Dec 31, 2024

Hey @stevenschlansker ,

The issue is we can easily fix this specific instance of the NPE, but I am not sure how many more exist in this situation.
Such a change is also very hard to verify as a fix too, so we will be shooting in the dark.

I generally agree we can improve the resilience of the driver in such conditions, so I will try to go over the code in the CommandHandler, but overall OOM can cause unpredictable and unforeseeable states of the driver.

Do let me know if you experience other such cases and we will try to address them too.

@stevenschlansker
Copy link
Author

Thank you very much. I agree, this type of problem is difficult to chase down, and hard to prove if it is solved or even helped. It might be possible to reveal more information by stress-testing lettuce in intentionally low direct memory environments, but for our purposes, we raised the limit and hopefully won't see it again. Thanks again for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants