Back to Knowledege base

Identifying and Removing Redundant Scheduled Jobs

A systematic approach to reviewing batch job schedules, diagnosing unused jobs, and safely cleaning up the job queue

Why Housekeeping Matters

Xi-Batch systems accumulate jobs over time. Schedules that were once critical fall out of use, one-off jobs remain on the queue in a Done state long after they served their purpose, and jobs targeting decommissioned remote hosts sit idle indefinitely. Each job consumes a slot in shared memory and adds visual clutter to the queue, making it harder for administrators to focus on what is actively running.

Regular housekeeping keeps the scheduler lean, reduces the risk of hitting shared memory limits, and ensures that conditions and assignments referencing orphaned variables do not cause unexpected behaviour.

This article walks through the process of exploring the job queue, identifying candidates for removal, and safely carrying out the cleanup.

Exploring the Job Queue

The btjlist command is the primary tool for reviewing jobs from the command line. By default it shows only local jobs in a terse format. Add -H for column headings and -R to include jobs on remote hosts:

bash

btjlist -H
btjlist -HR

The default display shows the job number, user, title, command interpreter, priority, load level, next scheduled time, conditions, and progress state. This is a good starting point but for housekeeping you will want additional fields.

Useful Format Codes for Housekeeping

The -F option lets you specify exactly which fields to display. The format string uses % codes, each representing a job attribute. A selection of codes particularly useful for housekeeping:

CodeMeaning
%NJob number (includes host prefix for remote jobs)
%UUser
%HJob name in full (includes queue prefix)
%hTitle without queue name
%qQueue name
%PProgress state (Run, Done, Err, Abrt, Canc, or blank)
%TDate and time in full
%tTime or date (short form)
%oTime submitted
%WLast or next run time
%rRepeat specification
%dDelete time (hours)
%cConditions (abbreviated)
%sAssignments (abbreviated)
%xExit code returned by last run
%eExport scope (local, export, or remote runnable)
%OOriginating host

A format string tailored for housekeeping review might look like this:

bash

btjlist -HR -F "%N %U %h %P %T %r %d"

This shows the job number, owner, title, progress state, full date and time, repeat specification, and auto-delete time for every job on all connected hosts.

To filter by queue name, user, or group:

bash

btjlist -H -q "nightly*" -F "%N %h %P %W %r"
btjlist -H -u olduser -F "%N %h %P %T"
btjlist -H -g finance -F "%N %h %P %W"

Understanding Progress States

The progress state is the single most important indicator when deciding whether a job is still needed. The possible states are:

blank (no state) : The job is waiting to run, either for its scheduled time or for conditions to be satisfied. This is the normal state for a healthy repeating job that is not currently executing.

Run : The job is currently executing. Do not remove jobs in this state.

Done : The job completed successfully and has been retained on the queue. If there is no repeat specification, it will sit in Done state indefinitely unless it has an auto-delete time set.

Err : The job terminated with an exit code in the error range. This could indicate a problem that was never resolved, or a job whose underlying process no longer exists.

Abrt : The job was terminated by a signal, either killed by an operator or due to a program fault. Persistent Abrt states often indicate abandoned jobs.

Canc : The job was cancelled before it ran. Jobs left in Canc state were typically set up but never activated, or were cancelled and forgotten about.

Identifying Stale Jobs

When reviewing the queue, look for the following indicators that a job may be a candidate for removal.

Jobs in Done, Err, Abrt, or Canc state with old dates : Use the full date and time field (%T) or the last/next time field (%W) to see when the job last ran or was last scheduled. If the date is months or years ago and the job has no repeat specification, it is very likely orphaned.

bash

btjlist -HR -F "%N %h %P %W %r" | grep -E "(Done|Err|Abrt|Canc)"

Jobs with a repeat specification that are stuck : A repeating job should have a future date in the time field or be currently running. If a repeating job shows a date far in the past, it may have encountered an error and stopped advancing. Check whether the "advance time on error" flag is set by reviewing the job's process parameters in btq.

Jobs with conditions referencing non-existent variables : If a job's conditions reference a variable that has been deleted, the job will never run. In btq, press C on the job to view its conditions and check that the referenced variables still exist. From the command line:

bash

btjlist -F "%N %h %C" | grep -i "variable_name"

Jobs belonging to users who have left : Filter by user and review whether any of their jobs are still required:

bash

btjlist -HR -u departed_user -F "%N %h %P %W %r"

Jobs with no time and no conditions : A job with no scheduled time and no conditions that is not in Run state will never execute. These are typically test jobs or jobs that were submitted in cancelled state and never activated.

Jobs targeting remote hosts that are no longer connected : Jobs with the export scope set to "remote runnable" may reference hosts that have been decommissioned. Check the Xi-Batch hosts file (typically /etc/xibatch-hosts) and verify connectivity:

bash

btconn hostname

If the host does not respond, jobs configured to run on that host are candidates for removal.

Checking When a Job Last Ran

Xi-Batch can maintain an audit trail of job activity if the LOGJOBS variable is configured. This variable specifies a file path where the scheduler writes a line for every job event - creation, completion, error, cancellation, and so on.

Each log line contains pipe-separated fields: date, time, job number, job title, status code, user, group, priority, and load level. The status codes include Completed, Abort, Cancel, Error, and others.

To find the last time a specific job completed successfully:

bash

grep "jobname" /usr/spool/batch/joblog | grep "Completed" | tail -5

If the LOGJOBS variable is not configured, you can still use the last/next time field in btjlist and the progress state to infer activity. A job in Done state with an old date has not run recently.

Reviewing Variables for Orphaned References

Variables and jobs are tightly linked through conditions and assignments. Before removing jobs, check whether they set or depend on variables that other jobs also reference. Use btvlist to list all variables:

bash

btvlist -H

To see which jobs reference a particular variable in their conditions or assignments:

bash

btjlist -HR -F "%N %h %C %S" | grep "VARIABLE_NAME"

If a variable is only referenced by jobs you plan to remove, the variable itself can also be removed afterwards. However, attempting to delete a variable that is still referenced by any job will produce an error.

Backing Up Before Cleanup

Before removing anything, create a backup of the current state. Xi-Batch provides utilities that export jobs, variables, command interpreters, and user profiles as shell scripts.

To back up jobs:

bash

mkdir -p /usr/batchsave/$(date +%Y%m%d)/Scripts
cd /usr/batchsave/$(date +%Y%m%d)
gbch-cjlist -D /usr/spool/batch btsched_jfile Jcmd Scripts

This creates Jcmd, a shell script that would resubmit all jobs, with the job scripts saved in the Scripts directory.

To back up variables:

bash

gbch-cvlist -D /usr/spool/batch btsched_vfile Vcmd

To back up command interpreters:

bash

gbch-ciconv -D /usr/spool/batch cifile Cicmd

To back up user permissions:

bash

gbch-uconv -D /usr/spool/batch btufile6 Ucmd

When restoring, the recommended order is: user permissions, command interpreters, variables, then jobs. This avoids errors from jobs referencing items that do not yet exist.

These backup scripts can be edited before restoration if you only need to recover specific items.

Safely Removing Jobs

Once you have confirmed a job is no longer needed, it can be removed in several ways.

Using btq interactively : Navigate to the job in the job list and press D to delete it. If the job is running, you must first kill it with K (which offers Int, Quit, Term, or Kill signals) and wait for it to stop before deleting. Confirmation may be requested depending on your settings.

Using btjdel from the command line : Delete a job by its job number:

bash

btjdel 1420

The job must not be running. For remote jobs, include the host prefix:

bash

btjdel avon:24918

Cancelling before deleting : If you want to stop a job from running but are not yet ready to remove it, use btjchange to set it to cancelled state:

bash

btjchange -C 1420

This prevents the job from executing whilst keeping it on the queue for review. The job can be deleted later when you are satisfied it is no longer needed.

Unqueueing for archival : If you want to preserve a copy of the job before removing it, use the unqueue function. In btq, press U on the job. This saves the job script and a command file that could resubmit it, then optionally removes it from the queue. This is the safest approach when you are unsure whether a job might be needed again.

Bulk identification : To list just the job numbers of all jobs in Done state for a specific user:

bash

btjlist -u olduser -F "%N %P" | grep "Done" | awk '{print $1}'

This output can be used to script bulk removal, though care should be taken to review each job before deleting.

Cleaning Up Variables After Job Removal

After removing jobs, check whether any variables are now orphaned. A variable is orphaned if no remaining job references it in a condition or assignment.

To check:

bash

btvlist -F "%N" | while read VARNAME; do
    REFS=$(btjlist -HR -F "%C %S" | grep -c "$VARNAME")
    if [ "$REFS" -eq 0 ]; then
        echo "Orphaned: $VARNAME"
    fi
done

Orphaned variables can be removed using btvar:

bash

btvar -d VARIABLE_NAME

Or from btq, switch to the variable list with V and delete with D.

Be cautious with variables that have a system-wide purpose, such as LOGJOBS, LOGVARS, STARTLIM, and STARTWAIT. These control scheduler behaviour and should not be removed.

Housekeeping Checklist

A periodic review - quarterly or before major upgrades - should cover the following.

Review all jobs by progress state : List jobs in Done, Err, Abrt, and Canc states. Determine whether each is still needed or can be removed.

Check repeat specifications and scheduled times : Identify repeating jobs with dates in the past that are no longer advancing. Investigate whether conditions or errors are preventing them from running.

Review job ownership : Identify jobs belonging to users who have left the organisation or changed roles.

Check remote host connectivity : Verify that all hosts in the Xi-Batch hosts file are still reachable. Use btconn to test connectivity.

Inspect conditions and assignments : Ensure that all variables referenced by conditions and assignments still exist. Look for circular dependencies or conditions that can never be satisfied.

Review the job log : If LOGJOBS is configured, check for jobs that have not completed successfully in a long time.

Back up before removing : Always run the backup utilities before deleting jobs or variables.

Document changes : Keep a record of what was removed and why. The backup scripts serve as a partial record, but a separate log of the rationale is helpful for audit purposes.

Best Practices

Schedule housekeeping during quiet periods when the batch schedule has no critical jobs running. Avoid deleting jobs whilst they are executing, as this will cause them to be killed.

When decommissioning a remote host, disconnect it cleanly with btdisconn before removing its entry from the hosts file. This avoids the scheduler attempting to reconnect during shutdown.

If shared memory is approaching capacity, removing old retained jobs frees slots immediately. Use btstart with appropriate sizing arguments when restarting the scheduler after significant cleanup to right-size the allocated shared memory.

For systems with complex job schedules, consider enabling LOGJOBS if not already configured. This provides an ongoing audit trail that makes future housekeeping reviews much simpler. Set it to a file path with a pipe-separated format for easy processing with standard Unix tools.

When removing jobs that are part of a queue (jobs sharing a queue name prefix), review the entire queue as a group rather than individual jobs. Queues often represent workflows where jobs have interdependencies through conditions and assignments. Removing one job from a queue without considering the others may leave the remaining jobs unable to run.

Identifying and Removing Redundant Jobs and Printers
A systematic approach to reviewing, diagnosing, and cleaning up unused print queue entries and printer definitions