Disable scheduler jobs during deployment

Like most active sites our applications have a healthy pipeline of change requests and bug fixes, and we manage this pipeline by maintaining a steady pace of small releases.

Each release is built, tested and deployed within a 3-4 week timeframe. Probably once or twice a month, on a Thursday evening, one or more deployments will be run, and each deployment is fully scripted with as few steps as possible. My standard deployment script has evolved over time to handle a number of cases where failures have happened in the past; failed deployments are rare now.

One issue we encountered some time ago was when a deployment script happened to be run at the same time as a database scheduler job; the job started halfway during the deployment when some objects were in the process of being modified. This led to some temporary compilation failures that caused the job to fail. Ultimately the deployment was successful, and the next time the job ran it was able to recover; but we couldn’t be sure that another failure of this sort wouldn’t cause issues in future. So I added a step to each deployment to temporarily stop all the jobs and re-start them after the deployment completes, with a script like this:

prompt disable_all_jobs.sql

begin
  for r in (
    select job_name
    from   user_scheduler_jobs
    where  schedule_type = 'CALENDAR'
    and    enabled = 'TRUE'
    order by 1
  ) loop
    dbms_scheduler.disable
      (name  => r.job_name
      ,force => true);
  end loop;
end;
/

This script simply marks all the jobs as “disabled” so they don’t start during the deployment. A very similar script is run at the end of the deployment to re-enable all the scheduler jobs. This works fine, except for the odd occasion when a job just happens to start running, just before the script starts, and the job is still running concurrently with the deployment. The line force => true in the script means that my script allows those jobs to continue running.

To solve this problem, I’ve added the following:

prompt Waiting for any running jobs to finish...

whenever sqlerror exit sql.sqlcode;

declare
  max_wait_seconds constant number := 60;
  start_time       date := sysdate;
  job_running      varchar2(100);
begin
  loop

    begin
      select job_name
      into   job_running
      from   user_scheduler_jobs
      where  state = 'RUNNING'
      and    rownum = 1;
    exception
      when no_data_found then
        job_running := null;
    end;

    exit when job_running is null;

    if sysdate - start_time > max_wait_seconds/24/60/60 then

      raise_application_error(-20000,
           'WARNING: waited for '
        || max_wait_seconds
        || ' seconds but job is still running ('
        || job_running
        || ').');

    else
      dbms_lock.sleep(2);
    end if;

  end loop;
end;
/

When the DBA runs the above script, it pauses to allow any running jobs to finish. Our jobs almost always finish in less than 30 seconds, usually sooner. The loop checks for any running jobs; if there are no jobs running it exits straight away – otherwise, it waits for a few seconds then checks again. If a job is still running after a minute, the script fails (stopping the deployment) and the DBA can investigate further to see what’s going on; once the job has finished, they can re-start the deployment.