Skip to content

Retries and Redriver

Shyam Kumar Akirala edited this page Dec 24, 2016 · 5 revisions

Retries

Number of retries for a task can be configured using @Task annotation retries parameter. The default value is 0. Exponential backoff strategy is followed for retries i.e, the wait time between each retry is exponential in seconds 2, 4, 8, 16, 32.......

All the retries are performed in the same JVM unless there is a JVM or node crash. In case of the crash the remaining retries are done on a different node using Redriver.

Internally retries are performed using Akka scheduler. Refer com.flipkart.flux.impl.task.AkkaTask class for the implementation.

Error types

Timeout

All timed out tasks are retried if the configured retries for that task > 0.

Retriable exception

If you wish the task to be retried in case of exception, you can throw FluxRetriableException or wrap the occurred exception with FluxRetriableException and throw it.

@Task(version = 1, retries = 2, timeout = 1000)
public someEvent task() {
     try {
         // do some stuff
     } catch(UserException ex) {
         throw new FluxRetriableException("message", ex); //wrap the exception with FluxRetriableException to make Flux runtime to retry this task
     }
}
other exceptions

If task execution results in an exception, the task WON'T be retried even if the configured retries are > 0.

Redriver

Redriver executes stalled/zombie Task instances including those that missed an execution schedule because of node failure in the Flux cluster.

Redriver functionality is achieved by storing Task id and scheduledTime in DB on task execution request. Once the task execution is done the entry would be deleted from DB. If the task is not getting executed for a specified time (redriver delay time), Redriver asks one of the available cluster nodes to execute the task.

Redriver runs on the oldest member of the cluster.

Clone this wiki locally