Like most active sites our applications have a healthy pipeline of change requests and bug fixes, and we manage this pipeline by maintaining a steady pace of small releases.
Each release is built, tested and deployed within a 3-4 week timeframe. Probably once or twice a month, on a Thursday evening, one or more deployments will be run, and each deployment is fully scripted with as few steps as possible. My standard deployment script has evolved over time to handle a number of cases where failures have happened in the past; failed deployments are rare now.
One issue we encountered some time ago was when a deployment script happened to be run at the same time as a database scheduler job; the job started halfway during the deployment when some objects were in the process of being modified. This led to some temporary compilation failures that caused the job to fail. Ultimately the deployment was successful, and the next time the job ran it was able to recover; but we couldn’t be sure that another failure of this sort wouldn’t cause issues in future. So I added a step to each deployment to temporarily stop all the jobs and re-start them after the deployment completes, with a script like this:
prompt disable_all_jobs.sql begin for r in ( select job_name from user_scheduler_jobs where schedule_type = 'CALENDAR' and enabled = 'TRUE' order by 1 ) loop dbms_scheduler.disable (name => r.job_name ,force => true); end loop; end; /
This script simply marks all the jobs as “disabled” so they don’t start during the deployment. A very similar script is run at the end of the deployment to re-enable all the scheduler jobs. This works fine, except for the odd occasion when a job just happens to start running, just before the script starts, and the job is still running concurrently with the deployment. The line
force => true in the script means that my script allows those jobs to continue running.
To solve this problem, I’ve added the following:
prompt Waiting for any running jobs to finish... whenever sqlerror exit sql.sqlcode; declare max_wait_seconds constant number := 60; start_time date := sysdate; job_running varchar2(100); begin loop begin select job_name into job_running from user_scheduler_jobs where state = 'RUNNING' and rownum = 1; exception when no_data_found then job_running := null; end; exit when job_running is null; if sysdate - start_time > max_wait_seconds/24/60/60 then raise_application_error(-20000, 'WARNING: waited for ' || max_wait_seconds || ' seconds but job is still running (' || job_running || ').'); else dbms_lock.sleep(2); end if; end loop; end; /
When the DBA runs the above script, it pauses to allow any running jobs to finish. Our jobs almost always finish in less than 30 seconds, usually sooner. The loop checks for any running jobs; if there are no jobs running it exits straight away – otherwise, it waits for a few seconds then checks again. If a job is still running after a minute, the script fails (stopping the deployment) and the DBA can investigate further to see what’s going on; once the job has finished, they can re-start the deployment.