Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting deadlock error while executing a task #5934

Open
akshaylad26 opened this issue Sep 11, 2024 · 16 comments
Open

Getting deadlock error while executing a task #5934

akshaylad26 opened this issue Sep 11, 2024 · 16 comments
Labels
for/team-attention For team attention

Comments

@akshaylad26
Copy link

Description:
We are getting intermittent deadlock issue while executing tasks. We observed that this usually happens when our execution cleanup job is running in parallel. Below is the stacktrace. We are unable to reproduce this issue in lower environments, but it is frequently happening in client environments. Can you please check? If any details required, please let me know, I will extract it.

Release versions:
We are using 2.11.4 release version.

org.springframework.dao.CannotAcquireLockException: PreparedStatementCallback; SQL [INSERT INTO BOOT3_BATCH_JOB_EXECUTION(JOB_EXECUTION_ID, JOB_INSTANCE_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, VERSION, CREATE_TIME, LAST_UPDATED) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ]; Deadlock found when trying to get lock; try restarting transaction at org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:78) at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1572) at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:667) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:960) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1015) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1020) at org.springframework.batch.core.repository.dao.JdbcJobExecutionDao.saveJobExecution(JdbcJobExecutionDao.java:243) at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:178) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:391) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) at org.springframework.batch.core.repository.support.AbstractJobRepositoryFactoryBean.lambda$getObject$0(AbstractJobRepositoryFactoryBean.java:204) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:244) at jdk.proxy2/jdk.proxy2.$Proxy94.createJobExecution(Unknown Source) at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:145) at org.springframework.batch.core.launch.support.TaskExecutorJobLauncher.run(TaskExecutorJobLauncher.java:59) at org.springframework.cloud.task.batch.handler.TaskJobLauncherApplicationRunner.execute(TaskJobLauncherApplicationRunner.java:147) at org.springframework.boot.autoconfigure.batch.JobLauncherApplicationRunner.executeLocalJobs(JobLauncherApplicationRunner.java:195) at org.springframework.boot.autoconfigure.batch.JobLauncherApplicationRunner.launchJobFromProperties(JobLauncherApplicationRunner.java:175) at org.springframework.cloud.task.batch.handler.TaskJobLauncherApplicationRunner.run(TaskJobLauncherApplicationRunner.java:112) at org.springframework.boot.autoconfigure.batch.JobLauncherApplicationRunner.run(JobLauncherApplicationRunner.java:165) at org.springframework.boot.SpringApplication.lambda$callRunner$4(SpringApplication.java:771) at org.springframework.util.function.ThrowingConsumer$1.acceptWithException(ThrowingConsumer.java:83) at org.springframework.util.function.ThrowingConsumer.accept(ThrowingConsumer.java:60) at org.springframework.util.function.ThrowingConsumer$1.accept(ThrowingConsumer.java:88) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:783) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:771) at org.springframework.boot.SpringApplication.lambda$callRunners$3(SpringApplication.java:759) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:357) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:759) at org.springframework.boot.SpringApplication.run(SpringApplication.java:332) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1327) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1316) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) at org.springframework.boot.loader.Launcher.launch(Launcher.java:95) at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65) Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:124) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916) at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1061) at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1009) at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1320) at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdate(ClientPreparedStatement.java:994) at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java) at org.springframework.jdbc.core.JdbcTemplate.lambda$update$2(JdbcTemplate.java:965) at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:648) ... 55 common frames omitted

@github-actions github-actions bot added the status/need-triage Team needs to triage and take a first look label Sep 11, 2024
@corneil
Copy link
Contributor

corneil commented Sep 11, 2024

@akshaylad26
Is this happening in a Batch Task application that is launched by Spring Cloud Data Flow or using a Schedule?

@corneil corneil added the status/need-feedback Calling participant to provide feedback label Sep 11, 2024
@corneil
Copy link
Contributor

corneil commented Sep 11, 2024

@akshaylad26 Can you provide more information on your cleanup job?

@cppwfs cppwfs removed the status/need-triage Team needs to triage and take a first look label Sep 11, 2024
@cppwfs
Copy link
Contributor

cppwfs commented Sep 11, 2024

@akshaylad26 Please provide the version of MySql and the platform you are launching the applications (Kubernetes, local, Cloud Foundry). Thanks!

@akshaylad26
Copy link
Author

We are launching using Kubernetes & MySQL version is 8.0.33
Both cleanup job & batch task application are scheduled operations. Cleanup job executes every 6 hours whereas task is scheduled every hour.
In Cleanup job we get executions using /tasks/executions endpoint in a paginated way sorted by task execution id in ascending order. Then we use /task/executions/{list of executionIds}?action=REMOVE_DATA,CLEANUP&schemaTarget=boot3 endpoint for cleaning up older executions.
We follow this way because normal cleanup API provided by SCDF doesn't clean executions without any start/end date.

Let me know if this helps.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Sep 11, 2024
@cppwfs
Copy link
Contributor

cppwfs commented Sep 11, 2024

What transaction isolation level are you using for your batch application(s)?

@cppwfs cppwfs added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Sep 11, 2024
@akshaylad26
Copy link
Author

We use default isolation level. We use hikari datasource & dont override it.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Sep 11, 2024
@cppwfs
Copy link
Contributor

cppwfs commented Sep 11, 2024

You may wish to consider setting the transaction isolation level of your apps to something like ISOLATION_REPEATABLE_READ as discussed in Spring Batch's Javadoc.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Sep 11, 2024
@akshaylad26
Copy link
Author

sure will give it a try. Thanks!

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Sep 11, 2024
@cppwfs cppwfs added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Sep 11, 2024
@akshaylad26
Copy link
Author

Creating new transaction with name [org.springframework.cloud.dataflow.server.service.LauncherInitializationService.initialize]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT

I see this line in SCDF server DEBUG logs as well. Does this also need to be updated to ISOLATION_REPEATABLE_READ?

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Sep 12, 2024
@cppwfs
Copy link
Contributor

cppwfs commented Sep 12, 2024

Thank you for your update! For now SCDF needs to keep its current transaction isolation level.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Sep 12, 2024
@akshaylad26
Copy link
Author

akshaylad26 commented Sep 12, 2024

I just checked in our Database, its default isolation level itself is REPEATABLE_READ.

image

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Sep 12, 2024
@cppwfs
Copy link
Contributor

cppwfs commented Sep 13, 2024

Understood. Not sure that necessarily affects the Batch apps in that it sets its own default.

@github-actions github-actions bot removed the for/team-attention For team attention label Sep 13, 2024
@cppwfs cppwfs added the status/need-feedback Calling participant to provide feedback label Sep 13, 2024
@akshaylad26
Copy link
Author

Hi @cppwfs ,

We tried adding transaction isolation level = ISOLATION_REPEATABLE_READ in our batch application. But today again got the same deadlock issue.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Sep 26, 2024
@cppwfs
Copy link
Contributor

cppwfs commented Sep 30, 2024

Can you share a small sample application that reproduces the issue? Thanks

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Sep 30, 2024
@akshaylad26
Copy link
Author

Hi @cppwfs ,

We were not able to reproduce issue. However, attaching innodb status file from mysql which shows details about deadlock.
deadlock.txt

Hope it helps to investigate further.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Oct 15, 2024
@akshaylad26
Copy link
Author

DELETE FROM BOOT3_BATCH_JOB_INSTANCE WHERE NOT EXISTS ( SELECT JOB_INSTANCE_ID FROM BOOT3_BATCH_JOB_EXECUTION WHERE BOOT3_BATCH_JOB_INSTANCE.JOB_INSTANCE_ID = JOB_INSTANCE_ID)

I believe this query holds the lock of entire table rows which causes the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/team-attention For team attention
Projects
None yet
Development

No branches or pull requests

3 participants