Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky-test: ExtensibleLoadManagerImplBaseTest.cleanup hangs in TableViewLoadDataStoreImpl.closeTableView due to a deadlock #23472

Open
1 of 2 tasks
lhotari opened this issue Oct 17, 2024 · 4 comments · May be fixed by #23487

Comments

@lhotari
Copy link
Member

lhotari commented Oct 17, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Example failure

https://github.com/apache/pulsar/actions/runs/11368523712/job/31653386805?pr=23468#step:10:616

thread dump: https://gist.github.com/lhotari/17557838cea2e4d4f4f1556fd4caec98
jstack.review analysis: https://jstack.review/?https://gist.github.com/lhotari/17557838cea2e4d4f4f1556fd4caec98#tda_1_dump

image

Exception stacktrace

"main" #1 prio=5 os_prio=0 cpu=67851.86ms elapsed=5366.03s tid=0x00007f52bc02f210 nid=0x33a0 waiting on condition  [0x00007f52c21fb000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00001000319052f8> (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
        at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3465)
        at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3436)
        at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898)
        at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:2072)
        at org.apache.pulsar.client.impl.TableViewImpl.close(TableViewImpl.java:200)
        at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.closeTableView(TableViewLoadDataStoreImpl.java:148)
        - locked <0x000010002025ce08> (a org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl)
        at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.close(TableViewLoadDataStoreImpl.java:204)
        - locked <0x000010002025ce08> (a org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl)
        at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.shutdown(TableViewLoadDataStoreImpl.java:209)
        - locked <0x000010002025ce08> (a org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl)
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.stopLoadDataReportTasks(ExtensibleLoadManagerImpl.java:816)
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.disableBroker(ExtensibleLoadManagerImpl.java:1036)
        at java.lang.invoke.LambdaForm$DMH/0x00007f5038410c00.invokeVirtual([email protected]/LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038540c00.invoke([email protected]/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038448400.invoke([email protected]/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038448800.invokeExact_MT([email protected]/LambdaForm$MH)
        at java.lang.invoke.MethodHandle.invokeWithArguments([email protected]/MethodHandle.java:732)
        at org.mockito.internal.util.reflection.InstrumentationMemberAccessor$Dispatcher$ByteBuddy$mHxiQzAK.invokeWithArguments(Unknown Source)
        at org.mockito.internal.util.reflection.InstrumentationMemberAccessor.invoke(InstrumentationMemberAccessor.java:251)
        at org.mockito.internal.util.reflection.ModuleMemberAccessor.invoke(ModuleMemberAccessor.java:55)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.tryInvoke(MockMethodAdvice.java:314)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice$RealMethodCall.invoke(MockMethodAdvice.java:234)
        at org.mockito.internal.invocation.InterceptedInvocation.callRealMethod(InterceptedInvocation.java:142)
        at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:45)
        at org.mockito.Answers.answer(Answers.java:90)
        at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:111)
        at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
        at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:34)
        at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:82)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.handle(MockMethodAdvice.java:134)
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.disableBroker(ExtensibleLoadManagerImpl.java:1033)
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerWrapper.disableBroker(ExtensibleLoadManagerWrapper.java:88)
        at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:975)
        at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:962)
        at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:525)
        at java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual([email protected]/DirectMethodHandle$Holder)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038448000.invoke([email protected]/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038448400.invoke([email protected]/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038448800.invokeExact_MT([email protected]/LambdaForm$MH)
        at java.lang.invoke.MethodHandle.invokeWithArguments([email protected]/MethodHandle.java:732)
        at org.mockito.internal.util.reflection.InstrumentationMemberAccessor$Dispatcher$ByteBuddy$mHxiQzAK.invokeWithArguments(Unknown Source)
        at org.mockito.internal.util.reflection.InstrumentationMemberAccessor.invoke(InstrumentationMemberAccessor.java:251)
        at org.mockito.internal.util.reflection.ModuleMemberAccessor.invoke(ModuleMemberAccessor.java:55)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.tryInvoke(MockMethodAdvice.java:314)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice$RealMethodCall.invoke(MockMethodAdvice.java:234)
        at org.mockito.internal.invocation.InterceptedInvocation.callRealMethod(InterceptedInvocation.java:142)
        at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:45)
        at org.mockito.Answers.answer(Answers.java:90)
        at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:111)
        at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
        at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:34)
        at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:82)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.handle(MockMethodAdvice.java:134)
        at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:509)
        at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:484)
        at java.lang.invoke.LambdaForm$DMH/0x00007f5038410c00.invokeVirtual([email protected]/LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038540c00.invoke([email protected]/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038448400.invoke([email protected]/LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/0x00007f5038448800.invokeExact_MT([email protected]/LambdaForm$MH)
        at java.lang.invoke.MethodHandle.invokeWithArguments([email protected]/MethodHandle.java:732)
        at org.mockito.internal.util.reflection.InstrumentationMemberAccessor$Dispatcher$ByteBuddy$mHxiQzAK.invokeWithArguments(Unknown Source)
        at org.mockito.internal.util.reflection.InstrumentationMemberAccessor.invoke(InstrumentationMemberAccessor.java:251)
        at org.mockito.internal.util.reflection.ModuleMemberAccessor.invoke(ModuleMemberAccessor.java:55)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.tryInvoke(MockMethodAdvice.java:314)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice$RealMethodCall.invoke(MockMethodAdvice.java:234)
        at org.mockito.internal.invocation.InterceptedInvocation.callRealMethod(InterceptedInvocation.java:142)
        at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:45)
        at org.mockito.Answers.answer(Answers.java:90)
        at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:111)
        at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
        at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:34)
        at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:82)
        at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.handle(MockMethodAdvice.java:134)
        at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:484)
        at org.apache.pulsar.broker.testcontext.PulsarTestContext$StartableCustomBuilder.lambda$initializePulsarServices$0(PulsarTestContext.java:778)
        at org.apache.pulsar.broker.testcontext.PulsarTestContext$StartableCustomBuilder$$Lambda$568/0x00007f503853d598.close(Unknown Source)
        at org.apache.pulsar.broker.testcontext.PulsarTestContext.callCloseables(PulsarTestContext.java:218)
        at org.apache.pulsar.broker.testcontext.PulsarTestContext.close(PulsarTestContext.java:212)
        at org.apache.pulsar.broker.auth.MockedPulsarServiceBaseTest.internalCleanup(MockedPulsarServiceBaseTest.java:287)
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImplBaseTest.cleanup(ExtensibleLoadManagerImplBaseTest.java:171)

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari lhotari changed the title Flaky-test: ExtensibleLoadManagerImplBaseTest.cleanup hangs in TableViewLoadDataStoreImpl.closeTableView Flaky-test: ExtensibleLoadManagerImplBaseTest.cleanup hangs in TableViewLoadDataStoreImpl.closeTableView due to a deadlock Oct 17, 2024
@lhotari
Copy link
Member Author

lhotari commented Oct 17, 2024

2 other threads related to the dead lock, in TableViewLoadDataStoreImpl.removeAsync method:

pulsar-load-manager-6391-1waiting to acquire [ 0x000010002025ce08 ] , holding [ 0x000010001e5511f0 0x00001000202630c0 ]
at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.removeAsync(TableViewLoadDataStoreImpl.java)
at org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl.doCleanup(ServiceUnitStateChannelImpl.java:1602)
at org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl.lambda$scheduleCleanup$45(ServiceUnitStateChannelImpl.java:1357)
at org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl$$Lambda$3727/0x00007f50392595c0.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run([email protected]/CompletableFuture.java:1804)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run([email protected]/Thread.java:840)

broker-client-shared-internal-executor-6394-1waiting to acquire [ 0x000010002025ce08 ] , holding [ 0x0000100034a98bc8 0x00001005062490e8 ]
at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.removeAsync(TableViewLoadDataStoreImpl.java)
at org.apache.pulsar.broker.loadbalance.extensions.reporter.TopBundleLoadDataReporter.tombstone(TopBundleLoadDataReporter.java:109)
at org.apache.pulsar.broker.loadbalance.extensions.reporter.TopBundleLoadDataReporter.handleEvent(TopBundleLoadDataReporter.java:138)
at org.apache.pulsar.broker.loadbalance.extensions.channel.StateChangeListeners.lambda$notify$3(StateChangeListeners.java:74)
at org.apache.pulsar.broker.loadbalance.extensions.channel.StateChangeListeners$$Lambda$3849/0x00007f5038922d18.accept(Unknown Source)
at java.util.concurrent.CopyOnWriteArrayList.forEach([email protected]/CopyOnWriteArrayList.java:807)
at org.apache.pulsar.broker.loadbalance.extensions.channel.StateChangeListeners.notify(StateChangeListeners.java:72)
at java.util.concurrent.CompletableFuture$UniAccept.tryFire([email protected]/CompletableFuture.java:718)
at java.util.concurrent.CompletableFuture.postComplete([email protected]/CompletableFuture.java:510)
at java.util.concurrent.CompletableFuture.complete([email protected]/CompletableFuture.java:2147)
at org.apache.pulsar.client.impl.ConsumerBase.lambda$completePendingReceive$0(ConsumerBase.java:333)
at org.apache.pulsar.client.impl.ConsumerBase$$Lambda$1932/0x00007f5038bd6358.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run([email protected]/Thread.java:840)

@lhotari
Copy link
Member Author

lhotari commented Oct 17, 2024

TableViewLoadDataStoreImpl.removeAsync was made synchronized in #21777

@heesung-sn heesung-sn linked a pull request Oct 20, 2024 that will close this issue
15 tasks
@heesung-sn
Copy link
Contributor

"main" #1 prio=5 os_prio=0 cpu=67851.86ms elapsed=5366.03s tid=0x00007f52bc02f210 nid=0x33a0 waiting on condition [0x00007f52c21fb000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00001000319052f8> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3465)
at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3436)
at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:2072)
at org.apache.pulsar.client.impl.TableViewImpl.close(TableViewImpl.java:200)
at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.closeTableView(TableViewLoadDataStoreImpl.java:148)

It's odd that the test main thread cannot complete the table view close operation, which blocks other TableViewLoadDataStoreImpl operations.

@heesung-sn
Copy link
Contributor

Raised a PR to unsynchrnoize TableViewLoadDataStoreImpl operations. #23487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants