This blog post will show how Quartz.NET version 1.0.3 can be extended to handle jobs that when they fails, are retried in the future a configurable number of times with a paus between attempts.
Quartz already has a feature to retry jobs immediatly. The only exception that a job is allowed to throw is JobExecutionException. If the refireImmediately flag is set on the exception it will be executed again by the scheduler on the same JobExecutionContext. My First idea was to take advantage of this and implement an abstract base class i.e RetryableJobBase that handles the retry logic by throwing a JobExecutionExeption with refireImmediatly flag set to true.
This would work, but wouldnt in an easy manner allow for a wait time between retries other than a call to Thread.Sleep. This will block the current thread, which is a threadpool thread, from executing other jobs. Neither didn't I like the idea of using a base class for this functionality due to the fact that .NET classes only can derive from one class. If another feature also would be implemented with base classes, for example workflow handling like the quartz FAQ suggests, it would not be possible to combine these two behaviours.
So I had to rethink my design a bit. Quartz.NET offers extendability, it uses interfaces for most classes. It is possible to plug in JobListener, TriggerListener and SchedulerListener. A listener can be registered with the scheduler and its methods are then called by the scheduler at appropriate time. IJobListener contains the following members:
void JobToBeExecuted(JobExecutionContext context);
void JobWasExecuted(JobExecutionContext context, JobExecutionException jobException);
As the methodnames explain, JobToBeExecuted is called before a job is executed and JobWasExecuted after a job is executed. If the job fails (throws exception) the jobException argument will be set.
This could be utilized for the retry logic. JobToBeExecuted must increase a try number counter that is saved per job. JobWasExecuted must reschedule the job if the job failed and the counter has not reached max retries. Scheduling the retry job with the scheduler instead of using Thread.Sleep will prevent the current thread from blocking. The thread can then perform other jobs until it's time to execute the retry. Only the next try will be scheduled, its impossible to know beforehand which try will success or fail.
The hard part in this is where to store the try number counter. My first thought was to store them in a dictionary but that would require some kind of unique key per job. The solution would work, but would require some extra work.
Quartz.NET is loosely coupled, a job is started by providing a JobDetail and a Trigger. When the Trigger fires, a new instance of the job is created by a IJobFactory (for more details about the IJobFactory see this blog post). The JobDetail contains a JobDataMap that can be used for providing data for jobs. Bingo! It sounds like the JobDataMap is the place to store the try number counter. But as the documentation states, any changes made to the contents of the job data map of a non-stateful job (job based on IJob) during execution of the job will be lost. So this would only work with implementations of IStatefulJob. That will be quite pointless because stateful jobs are not allowed to be run concurrently.
But if the job retry is scheduled with the same JobDetail as the original job, the JobDataMap will be intact independently of the job is stateful or not! It doesnt even matter that the behaviour for the default IJobFactory is to create a new instance of the job when the retry trigger fires, because the JobDataMap is stored on the JobDetail in the JobContext and not on the job itself.
So if the job retry is scheduled on the same JobDetail as the original job the JobDataMap is a good place to store the try number counter.
The retry is scheduled by creating a new trigger that utilizes an interface for a RetryableJob. It's up to the implementator of the IRetryableJob interface to decide how many retries that can be done and when the next retry will be performed in the JobWasExecuted method:
var oldTrigger = context.Trigger;
// Unschedule old trigger
_scheduler.UnscheduleJob(oldTrigger.Name, oldTrigger.Group);
// Create and schedule new trigger
var retryTrigger = new SimpleTrigger(oldTrigger.Name, oldTrigger.Group, retryableJob.StartTimeRetryUtc, retryableJob.EndTimeRetryUtc, 0, TimeSpan.Zero);
_scheduler.ScheduleJob(context.JobDetail, retryTrigger);
Notice above that the job is scheduled with the current JobDetail.
The try number counter is stored in the JobDataMap of the JobDetail:
if (!context.JobDetail.JobDataMap.Contains(NumberTriesJobDataMapKey))
context.JobDetail.JobDataMap[NumberTriesJobDataMapKey] = 0;
int numberTries = context.JobDetail.JobDataMap.GetIntValue(NumberTriesJobDataMapKey);
context.JobDetail.JobDataMap[NumberTriesJobDataMapKey] = ++numberTries;
All code is available at my GitHub account