Back to Knowledege base

Managing Job Start Rate with STARTLIM and STARTWAIT

Preventing resource exhaustion by controlling how many jobs start simultaneously

Understanding Job Start Rate Control

When many jobs become ready simultaneously (scheduled for the same time, or conditions satisfied together), Xi-Batch can overwhelm system resources by starting them all at once. STARTLIM and STARTWAIT variables control this behaviour.

The Problem

Resource swamping occurs when:

  • Hundreds of jobs scheduled for same time (e.g., midnight)
  • Cascade effect when one variable change releases many waiting jobs
  • Network-intensive jobs saturating bandwidth
  • Disk I/O overwhelming storage subsystem
  • Process table exhaustion from too many simultaneous spawns

Symptoms:

  • System becomes unresponsive at job start times
  • Network timeouts during peak job starts
  • Jobs fail with "resource temporarily unavailable" errors
  • High load average spikes
  • Disk I/O wait times increase dramatically

How STARTLIM Works

STARTLIM : Maximum number of jobs Xi-Batch will start in a single batch

Default value: 5

When jobs are ready to start, Xi-Batch processes them in batches:

  1. Scheduler identifies all ready jobs
  2. Starts first STARTLIM jobs (highest priority first)
  3. Waits STARTWAIT seconds
  4. Starts next batch of STARTLIM jobs
  5. Repeats until all ready jobs started

How STARTWAIT Works

STARTWAIT : Waiting time in seconds between job start batches

Default value: 30 seconds

This delay allows started jobs to:

  • Complete initialization
  • Establish network connections
  • Allocate resources
  • Reduce competition for system resources

Checking Current Settings

bash

# View current values
btvar -v STARTLIM STARTWAIT

# Or use btq
btq -V  # Switch to variables screen
# Look for STARTLIM and STARTWAIT

Example output:

STARTLIM    5      # Number of jobs to start at once
STARTWAIT   30     # Wait time in seconds for job start

Adjusting STARTLIM

Increase STARTLIM when:

  • High-performance hardware can handle more concurrent starts
  • Jobs are lightweight and start quickly
  • Network and I/O subsystems are fast
  • No resource contention observed

bash

# Increase to 10 jobs per batch
btvar -s STARTLIM 10

Decrease STARTLIM when:

  • System becomes unresponsive during job starts
  • Network saturates during peak times
  • Disk I/O bottlenecks occur
  • Process table fills up
  • Resource allocation failures observed

bash

# Reduce to 3 jobs per batch
btvar -s STARTLIM 3

Adjusting STARTWAIT

Increase STARTWAIT when:

  • Jobs need more initialization time
  • Network connections take time to establish
  • Resource contention observed between batches
  • Slower hardware or storage

bash

# Increase wait to 60 seconds
btvar -s STARTWAIT 60

Decrease STARTWAIT when:

  • Jobs start quickly and cleanly
  • No resource contention
  • High-performance systems
  • Want faster job throughput

bash

# Reduce wait to 15 seconds
btvar -s STARTWAIT 15

Finding Optimal Settings

Test different values to find optimal settings for your environment:

Step 1: Establish baseline

Monitor system during typical job start period:

bash

# Watch load average and job starts
watch -n 5 'uptime; btjlist | grep " Run " | wc -l'

Step 2: Test incremental changes

Make small adjustments:

bash

# Start conservative
btvar -s STARTLIM 3
btvar -s STARTWAIT 45

# Monitor for several days
# Gradually increase STARTLIM if system handles load well

Step 3: Monitor key metrics

  • System load average
  • Network utilization
  • Disk I/O wait percentage
  • Job failure rates
  • Time to complete job batches

Step 4: Iterate

Adjust based on observations until optimal balance achieved.

Example Scenarios

High-Volume Network Jobs

400 jobs scheduled for midnight, each performs network file transfer:

bash

# Conservative settings to prevent network saturation
btvar -s STARTLIM 2
btvar -s STARTWAIT 60

# Jobs start 2 at a time, 60 seconds between batches
# Takes approximately 200 minutes to start all 400

Lightweight Batch Jobs

100 small jobs that complete in seconds:

bash

# Aggressive settings for fast throughput
btvar -s STARTLIM 15
btvar -s STARTWAIT 10

# Jobs start 15 at a time, 10 seconds between batches
# All 100 started within approximately 70 seconds

Mixed Workload

Mix of heavy and light jobs:

bash

# Moderate settings for balance
btvar -s STARTLIM 5
btvar -s STARTWAIT 30

# Default settings often work well for mixed workloads

Dynamic Adjustment

Adjust settings based on time of day or system load:

Example: Business hours vs overnight

bash

#!/bin/bash
# Scheduled job to adjust start rate

HOUR=$(date +%H)

if [ "$HOUR" -ge 8 ] && [ "$HOUR" -lt 18 ]; then
    # Business hours: be conservative
    btvar -s STARTLIM 2
    btvar -s STARTWAIT 60
else
    # Overnight: more aggressive
    btvar -s STARTLIM 10
    btvar -s STARTWAIT 20
fi

Schedule this to run hourly:

bash

echo "0 * * * * /usr/local/bin/adjust-startrate.sh" | btr -r 1:h

Integration with Monitoring

Monitor and alert on resource exhaustion:

bash

#!/bin/bash
# Alert if too many jobs starting causes issues

LOAD=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
THRESHOLD=10

if (( $(echo "$LOAD > $THRESHOLD" | bc -l) )); then
    # High load detected, reduce start rate
    CURRENT_STARTLIM=$(btvar -v STARTLIM | awk '{print $2}')
    
    if [ "$CURRENT_STARTLIM" -gt 2 ]; then
        NEW_STARTLIM=$((CURRENT_STARTLIM - 1))
        btvar -s STARTLIM "$NEW_STARTLIM"
        
        logger "Xi-Batch: Reduced STARTLIM to $NEW_STARTLIM due to high load"
        echo "Load average $LOAD exceeds threshold, reduced STARTLIM" | \
            mail -s "Xi-Batch Auto-Adjustment" admin@example.com
    fi
fi

Verifying Settings Are Effective

Watch job starts in real-time:

bash

# Terminal 1: Watch variables
watch -n 2 'btvar -v STARTLIM STARTWAIT; echo ""; btvar -v CLOAD'

# Terminal 2: Monitor job starts
btjlist | grep " Run "

# Terminal 3: System load
watch -n 2 uptime

Observe that:

  • Jobs start in batches of STARTLIM size
  • Delay of STARTWAIT seconds between batches
  • System load remains manageable
  • No resource exhaustion errors

Troubleshooting

Jobs not starting despite being ready : Check LOADLEVEL hasn't been exceeded. STARTLIM only controls batch size, not total running jobs.

bash

btvar -v LOADLEVEL CLOAD

All jobs start simultaneously despite STARTLIM : Verify STARTLIM actually set:

bash

btvar -v STARTLIM
# Should show your configured value, not 0 or blank

System still overloaded : STARTLIM may still be too high, or STARTWAIT too short. Continue reducing:

bash

btvar -s STARTLIM 1
btvar -s STARTWAIT 120

Jobs taking too long to start : If STARTWAIT too long, reduce incrementally:

bash

# Current: 120 seconds
# Try: 90 seconds
btvar -s STARTWAIT 90

Best Practices

Start conservative : Begin with low STARTLIM (2-3) and high STARTWAIT (60-90 seconds), increase gradually

Monitor before adjusting : Collect data on system behavior before making changes

Document changes : Record STARTLIM/STARTWAIT adjustments and reasons

Test during low-impact periods : Experiment with settings during non-critical times

Consider hardware limitations : Slower systems need more conservative settings

Account for network topology : Jobs accessing network resources need longer delays

Review regularly : As workload patterns change, revisit settings

Removing Xi-Batch from Your System
Complete uninstallation procedures for RPM and tarball installations