Disable scheduler jobs during deployment
Like most active sites our applications have a healthy pipeline of change requests and bug fixes, and we manage this pipeline by maintaining a steady pace of small releases.
Each release is built, tested and deployed within a 3-4 week timeframe. Probably once or twice a month, on a Thursday evening, one or more deployments will be run, and each deployment is fully scripted with as few steps as possible. My standard deployment script has evolved over time to handle a number of cases where failures have happened in the past; failed deployments are rare now.
One issue we encountered some time ago was when a deployment script happened to be run at the same time as a database scheduler job; the job started halfway during the deployment when some objects were in the process of being modified. This led to some temporary compilation failures that caused the job to fail. Ultimately the deployment was successful, and the next time the job ran it was able to recover; but we couldn’t be sure that another failure of this sort wouldn’t cause issues in future. So I added a step to each deployment to temporarily stop all the jobs and re-start them after the deployment completes, with a script like this:
prompt disable_all_jobs.sql
begin
for r in (
select job_name
from user_scheduler_jobs
where schedule_type = 'CALENDAR'
and enabled = 'TRUE'
order by 1
) loop
dbms_scheduler.disable
(name => r.job_name
,force => true);
end loop;
end;
/
This script simply marks all the jobs as “disabled” so they don’t start during the deployment. A very similar script is run at the end of the deployment to re-enable all the scheduler jobs. This works fine, except for the odd occasion when a job just happens to start running, just before the script starts, and the job is still running concurrently with the deployment. The line force => true
in the script means that my script allows those jobs to continue running.
To solve this problem, I’ve added the following:
prompt Waiting for any running jobs to finish...
whenever sqlerror exit sql.sqlcode;
declare
max_wait_seconds constant number := 60;
start_time date := sysdate;
job_running varchar2(100);
begin
loop
begin
select job_name
into job_running
from user_scheduler_jobs
where state = 'RUNNING'
and rownum = 1;
exception
when no_data_found then
job_running := null;
end;
exit when job_running is null;
if sysdate - start_time > max_wait_seconds/24/60/60 then
raise_application_error(-20000,
'WARNING: waited for '
|| max_wait_seconds
|| ' seconds but job is still running ('
|| job_running
|| ').');
else
dbms_lock.sleep(2);
end if;
end loop;
end;
/
When the DBA runs the above script, it pauses to allow any running jobs to finish. Our jobs almost always finish in less than 30 seconds, usually sooner. The loop checks for any running jobs; if there are no jobs running it exits straight away – otherwise, it waits for a few seconds then checks again. If a job is still running after a minute, the script fails (stopping the deployment) and the DBA can investigate further to see what’s going on; once the job has finished, they can re-start the deployment.