Back

Distributed Lock for Scheduled Tasks

cover

Scheduling jobs is one of the core requirements nowadays when we develop an application on a large scale. Since we are using the spring boot application as our reference, we know that the schedular works well with a single instance. As soon as we scale up our application to multiple nodes, we need to ensure that our application is ready for this architecture. The scheduled part of the application requires changes before they fit into such an architecture. SpringBoot cannot handle scheduler synchronization itself over multiple nodes as it executes the jobs simultaneously on each node. Hence, Shedlock is the ideal candidate for this scenario. In this article, we’ll look, how we can use Shedlock to execute a scheduled task using the Spring Boot application.

Scheduled In A Distributed Environment

Spring Boot applications use annotation for simple job reporting tasks to big tasks like data clean. It has a variety of usage according to the application requirement. When our application runs on a single instance we will have no problem as the execution happens only once. But when our application runs in a distributed environment with multiple instances running in parallel, our scheduled jobs are executed without consistency. It will result in redundant or inconsistent data.

How Shedlock Helps To Run A Job Only Once

Hence we might want to execute each scheduled task only once. Since spring boot doesn’t provide a solution to run the scheduled task only once in a distributed environment, we use Shedlock to solve this issue. We also have another scheduling mechanism — Quartz. We will look deeply into why it doesn’t fit this requirement in the later section.

Shedlock ensures a task executes only once at the same time in any environment be it a single system or a distributed one. As we have an issue with scheduling in a distributed environment we will focus on it. Let’s say we have three instances in a distributed environment where the first instance acquires the lock for a scheduled task but the remaining two instances will skip the task execution automatically. Now when the next task scheduling happens, all the instances will try to get the lock but the same thing happens again, as mentioned above. This way, Shedlock helps to run a job only once. Now we will look at how Shedlock does this work below:

Shedlock uses persistent storage that connects to all the instances to store the information about the scheduled task. There are multiple ways to implement this LockProvider as Shedlock has support for numerous databases like MongoDB, PostgreSQL, Redis, MySQL, etc. For our reference, we will use PostgreSQL as an example.

  1. name: A unique name provided by the user for scheduled job
  2. lock_until: The time till the current execution of the job is locked
  3. locked_at: The timestamp when the instance acquired the lock
  4. locked_by: The identifier who acquires the lock

The table that Shedlock uses inside the database to manage the locks is straightforward. It has only had four columns:

Scheduling Tasks using Shedlock

This way, Shedlock creates an entry for every scheduled task only when the job runs for the first time. After that, when the task executes again, the same database row gets updated without deleting or re-creating the entry.

Locking a scheduled task happens, when we set the lock_until column to date in the future. When our scheduled task executes, all the running instances try to update the database row for the task. But only one succeeds to do so when ( lock_until <= now()) . The instance that updates the columns for lock_until, locked_at, locked_by has the lock for this execution period and sets lock_until to now() + lockAtMostFor (e.g Current time is 12:30 and lockAtMostFor is 30 min then the task is locked until the time becomes 13:00 and no further instances will be able to perform the task till 13:00) here is a snapshot of the database running the task with Shedlock.

Locking Short Running Tasks

In case, the task doesn’t finish due to node crash or time delay we can execute the task only after the ( lock_until<=now()). We will discuss the use of lockAtMostFor and lockAtLeastFor in the coming section and why this helps to prevent the dead-locks or running of small jobs multiple times.

We can use scheduled lock for jobs which can be either a short running job or a long-running job. When we use lockAtMostFor as a parameter in the locking we can prevent the task to go beyond the time specified but let’s say we have a scenario where the task gets completed in a short time like 10 sec and we have multiple instances whose clock is not synced then it such scenario the instance will release the lock and the other one can acquire the lock to perform the same redundant task. For such a case we use the lockAtLeastFor parameter to prevent the job from running the same task at the same time multiple times. Here is an example of such a scenario:

@Scheduled(cron = "0 0/1 * * * *")
@SchedulerLock(name = "cronTask", lockAtMostFor = "PT40S", lockAtLeastFor = "PT20S")
public void cronTask() {
 //code for the task to be performed
}

How To Set Shedlock for SpringBoot

In this above example, we set lockAtLeastFor as part of our job definition, to block the next execution for at least the specified period. Shedlock will then set lock_until to at least locked_at + lockAtLeastFor before unlocking for the next job. In this case, if we take the example of 10 sec for the task and our lockAtLeastFor is 20 sec so if a task starts at 12:00:00 then it will be locked till 12:00:20 and no other instance will be able to run the job till this time.

Maven Dependencies

To setup Shedlock with a SpringBoot application we can use either Maven or Gradle dependencies like this:

<dependency> <dependency>
 <groupId>net.javacrumbs.shedlock</groupId>
 <groupId>net.javacrumbs.shedlock</groupId>
 <artifactId>shedlock-provider-jdbc-template</artifactId>
 <version>${shedlock.version}</version>
 </dependency> 
 <artifactId>shedlock-spring</artifactId>
 <version>${shedlock.version}</version>
 </dependency>

Gradle Dependencies

{shedlock.version} is the version we can set inside the properties of pom.xml

implementation group: 'net.javacrumbs.shedlock', name: 'shedlock-spring', version: '4.25.0'
implementation group: 'net.javacrumbs.shedlock', name: 'shedlock-provider-jdbc-template', version: '4.25.0'

Apart from that, we need to create a table inside the shared database for storing the lock information.

CREATE TABLE lockProvider (
 name VARCHAR(64),
 lock_until TIMESTAMP(3) NULL,
 locked_at TIMESTAMP(3) NULL,
 locked_by VARCHAR(255),
 PRIMARY KEY (name)
)

After setting up the database and dependencies we need to set up the Spring Boot application for Shedlock. As a first step, we have to enable using a configuration class and provide a spring bean of type LockProvider as part of our ApplicationContext. In our case we are using a relational database, we have to use the JdbcTemplateLockProvider and configure it using the auto-configuredDataSource like given below:

@Configuration
@EnableScheduling
@EnableSchedulerLock(defaultLockAtMostFor = "PT20M")
public class LockConfig {
 
 @Bean
 public LockProvider scheduledJobLockProvider(DataSource dataSource) 
 {
 return new JdbcTemplateLockProvider(
 JdbcTemplateLockProvider.Configuration.builder()
 .withJdbcTemplate(new JdbcTemplate(dataSource))
 .usingDbTime()
 .build()
 );
 }
}

We have to specify defaultLockAtMostFor using @EnableSchedulerLock annotation for default use just in case we forgot to write the parameter lockAtMostFor.

After these, the only thing left now is to add the @SchedulerLock annotation for all the scheduled methods. With this annotation, we have to provide the name for the scheduled task that is used as a primary key for the internal Shedlock table. It must be unique across our application. Also, we can provide the lockAtMostFor and lockAtLeastFor as per our requirement but it is suggested we follow such a trend so that our scheduler can work in all the scenarios.

Why we choose Shedlock over Quartz for the scheduling mechanism

@Scheduled(cron = "0 */10 * * * *")
 @SchedulerLock(name = "lockedTask", lockAtMostFor = "PT5M", lockAtLeastFor = "PT2M")
 public void lockedTask() {
 //code for task to be performed
 }

Different usage scenarios

Quartz is an open-source job scheduling library that is used to create simple or complex schedules for executing huge jobs whereas Shedlock is a distributed Lock for Scheduled Tasks which is used for providing locks in a distributed environment.

Difference between their working

Quartz is used in a scenario like System Maintenance or Providing reminder services within an application whereas Shedlock is used for simple job reporting tasks to big tasks like data clean up.

Quartz notifies Java objects which implement the JobListener and TriggerListener interfaces whenever a trigger occurs and also notify them after the job has been executed. Also when the jobs get completed, they return a JobCompletionCode which informs the scheduler of success or failure whereas Shedlock stores the information about scheduled jobs using persistent storage that are connected to all the nodes using a table or document in the database where it stores the information about current locks on the scheduled jobs.