DBMS_JOB — Watching for Failures
If you rely on jobs submitted via DBMS_JOB to fail, then read on.
Join the DZone community and get the full member experience.
Join For FreeI had a friend point this one out to me recently. They use DBMS_JOB to allow some “fire and forget” style functionality for the user, and in their case, the jobs are “best efforts” in that if they fail, it is not a big deal.
While this may sound counter-intuitive, if you rely on jobs submitted via DBMS_JOB to fail, then please read on.
By default, if a job fails 16 times in a row, it will be marked as broken by the database. Here’s an example where the anonymous block will obviously fail because we are dividing by zero each time. I’ll set the job to run every 5 seconds so that within a couple of minutes, we’ll have 16 failures. First, I’ll run this on 11.2.0.4
SQL> select banner from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4 - 64bit Production
SQL> declare
2 j int;
3 begin
4 dbms_job.submit(j,'declare x int; begin x := 1/0; end;',sysdate,'sysdate+5/86400');
5 end;
6 /
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
[wait for a bit]
SQL> select job, what, failures from user_jobs;
JOB WHAT FAILURES
---------- -------------------------------------------------- ----------
23 declare x int; begin x := 1/0; end; 8
[wait for a bit]
SQL> /
JOB WHAT FAILURES B
---------- -------------------------------------------------- ---------- -
3 declare x int; begin x := 1/0; end; 9 N
SQL> /
JOB WHAT FAILURES B
---------- -------------------------------------------------- ---------- -
3 declare x int; begin x := 1/0; end; 16 Y
SQL> /
JOB WHAT FAILURES B
---------- -------------------------------------------------- ---------- -
3 declare x int; begin x := 1/0; end; 16 Y
You can see the “breaking” of the job in action here. We got to 16 failures, and the database decided “enough is enough” and the job will no longer run until some sort of intervention is performed by an administrator.
Now I’ll run that demo again in 12.2
SQL> declare
2 j int;
3 begin
4 dbms_job.submit(j,'declare x int; begin x := 1/0; end;',sysdate,'sysdate+5/86400');
5 end;
6 /
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
[wait for a bit]
SQL> select job, what, failures from user_jobs;
JOB WHAT FAILURES
---------- -------------------------------------------------- ----------
23 declare x int; begin x := 1/0; end; 8
[wait for a bit]
SQL> /
JOB WHAT FAILURES B
---------- -------------------------------------------------- ---------- -
3 declare x int; begin x := 1/0; end; 13 N
SQL> /
JOB WHAT FAILURES B
---------- -------------------------------------------------- ---------- -
3 declare x int; begin x := 1/0; end; 19 N
SQL> /
JOB WHAT FAILURES B
---------- -------------------------------------------------- ---------- -
3 declare x int; begin x := 1/0; end; 25 N
You can see that in 12.2 (and I’ve tested in 18 and 19) that failures can continue past 16 unabated. If you’re being hit by this, patches are available, so please get in touch with Support. A quick workaround until you can apply patches is to use another job to monitor the others. Here’s a small anonymous block you could run in each container as SYSDBA that you could schedule (say) every few minutes
SQL> begin
2 for i in ( select job
3 from dba_jobs
4 where failures > 16
5 and job not in ( sys_context('userenv','fg_job_id'),sys_context('userenv','bg_job_id'))
6 and job not in ( select job from dba_jobs_running )
7 )
8 loop
9 dbms_ijob.broken(i.job,true);
10 end loop;
11 commit;
12 end;
13 /
PL/SQL procedure successfully completed.
Ideally, you probably want to be moving to DBMS_SCHEDULER where possible, which has greater flexibility and control over error handling amongst many other things.
Published at DZone with permission of Connor McDonald, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments