Back to Knowledege base

Troubleshooting Variable Synchronization Across Hosts

Resolving remote variable access and network-wide variable update problems

Understanding Variable Synchronization

In networked Xi-Batch environments, variables can be:

Local : Only visible on the host where created

Exported : Shared across all networked hosts

Variables marked as "exported" synchronize across the network, allowing jobs on any host to reference them in conditions and assignments.

How Variable Synchronization Works

When variable is exported:

  1. Variable exists on origin host
  2. Xi-Batch broadcasts variable to all connected hosts
  3. Remote hosts cache variable information
  4. Updates propagate automatically across network
  5. Jobs anywhere can reference the variable

Common Synchronization Problems

Remote variable not visible : Variable not exported, or network connectivity issue

Variable value not updating : Update not propagating, or stale cache

"Variable does not exist" errors : Variable not created, or host disconnected

Concurrent update conflicts : Multiple jobs modifying same variable simultaneously

Critical condition blocking : Remote variable unavailable, condition marked critical

Diagnostic Approach

Step 1: Verify Variable Export Status

Check if variable is exported:

bash

# List all variables with export status
btvlist

# Look for "Export" in output:
# varname    value    Export    # Exported variable
# varname    value             # Local variable

Or check specific variable:

bash

# Show variable details
btvar -v <variable_name>

# Output includes export status:
# Export: Yes  ← Exported
# Export: No   ← Local only

Step 2: Check Network Connectivity

Verify Xi-Batch network connectivity between hosts:

bash

# Check configured hosts
cat /etc/Xibatch-hosts

# Test scheduler connectivity
btjlist -H  # Should show local host without errors

# View remote jobs (tests network)
btjlist -r

If remote host unreachable:

bash

# Check network connection
ping <remote_host>

# Verify Xi-Batch scheduler running
ssh <remote_host> "ps aux | grep btsched"

# Check firewall/ports
# Default Xi-Batch ports: 10000-10001

Step 3: Test Remote Variable Access

Try accessing remote variable directly:

bash

# Read remote variable
btvar -v <remote_host>:<variable_name>

# Expected: Variable value displayed
# Error: "Cannot connect" or "Variable not found"

Step 4: Check Variable Propagation

Create test variable and verify propagation:

bash

# On host A: Create exported variable
btvar -c testvar 100
btvar -e testvar  # Mark as exported

# On host B: Check if visible
btvar -v hostA:testvar

# Should display: 100

If not visible, synchronization problem exists.

Step 5: Review Scheduler Logs

Check for network errors in scheduler log:

bash

# View recent log entries
tail -50 /var/spool/xi/batch/btsched_reps

# Look for:
# - Connection errors
# - Network timeouts
# - Variable synchronization failures

Resolving Common Issues

Issue 1: Variable Not Exported

Symptom:

bash

# Variable exists locally
btvar -v STATUS
# Shows value: Ready

# Not visible remotely
btvar -v thishost:STATUS
# Error: Variable does not exist

Solution:

Export the variable:

bash

# Mark variable as exported
btvar -e STATUS

# Verify export status
btvlist | grep STATUS
# Should show "Export" flag

Issue 2: Remote Host Disconnected

Symptom:

bash

# Cannot access remote variables
btvar -v server2:BACKUP_STATUS
# Error: Cannot connect to server2

Solution:

Check network connectivity:

bash

ping server2

Verify Xi-Batch running on remote host:

bash

ssh server2 "ps aux | grep btsched"

If not running:

bash

ssh server2 "btstart"

Check firewall rules:

Ensure ports 10000-10001 open between hosts:

bash

# Test port connectivity
telnet server2 10001

Verify hosts file configuration:

bash

# Check both hosts configured
cat /etc/Xibatch-hosts

# Should list both hosts with correct IP addresses

Issue 3: Stale Variable Cache

Symptom:

bash

# Variable updated on host A
# Host A: btvar -v STATUS → "Complete"

# Host B still sees old value
# Host B: btvar -v hostA:STATUS → "Pending"

Solution:

Variable updates should propagate automatically. If stale:

Force refresh by recreating network connection:

On host B:

bash

# Restart Xi-Batch to refresh connections
btquit -y
btstart

Or restart on host A (where variable changed):

bash

ssh hostA "btquit -y; btstart"

After restart, verify:

bash

btvar -v hostA:STATUS
# Should show updated value: "Complete"

Issue 4: Concurrent Update Conflict

Symptom:

Two jobs on different hosts try to update same variable simultaneously. Xi-Batch uses locking, but race conditions can occur.

Example:

bash

# Job A on host1: counter += 1
# Job B on host2: counter += 1
# Both start simultaneously
# Expected: counter increases by 2
# Actual: counter increases by 1 (one update lost)

Solution:

Use Xi-Batch's atomic update mechanism with conditions:

Job A:

bash

btr -A "counter += 1 @ Job start if counter < 1000" script.sh

Job B:

bash

btr -A "counter += 1 @ Job start if counter < 1000" script.sh

Only one job will succeed in updating if they run simultaneously. The other's assignment won't execute (condition fails).

Alternative: Serialize updates

Use variable as lock:

bash

# Job script
while ! btvar -s UPDATE_LOCK 1 2>/dev/null; do
    sleep 1
done

# Update counter
CURRENT=$(btvar -v counter | awk '{print $2}')
NEW=$((CURRENT + 1))
btvar -s counter $NEW

# Release lock
btvar -s UPDATE_LOCK 0

Issue 5: Critical Condition Blocking

Symptom:

bash

# Job has critical condition
# Condition: server2:BACKUP_DONE = Yes (critical)
# Server2 down
# Job cannot start

Solution:

Option A: Make condition non-critical

If job can run when remote variable unavailable:

bash

btjchange <job_number>
# Edit condition, remove critical flag

Job will start even if server2 unreachable.

Option B: Restore remote host

Bring server2 back online:

bash

ssh server2 "btstart"

Option C: Use local variable

Create local copy that tracks remote variable:

bash

# Setup job on server2 to update local copy
btr -A "server1:BACKUP_DONE = Yes @ Job completed" backup-job.sh

# Change main job to use local variable
btjchange <job_number>
# Condition: server2:BACKUP_DONE → BACKUP_DONE (local)

Network-Wide Variable Management

Identify All Exported Variables

bash

# List all exported variables
btvlist -e

# Shows variables visible across network

Find Variables by Host

bash

# List variables on specific host
btvlist | grep "^hostname:"

# Or from remote host
ssh hostname "btvlist"

Track Variable Propagation

Enable variable logging on both hosts:

bash

# Host A
btvar -s LOGVARS varlog_hostA

# Host B
btvar -s LOGVARS varlog_hostB

# Make change on Host A
btvar -s SHARED_VAR "New value"

# Check both logs
# Host A log: Shows assignment
# Host B log: Should show update propagation

Synchronization Best Practices

Limit exported variables : Only export variables that truly need network-wide visibility

Use descriptive names : Include host identifier in variable name when appropriate

Document dependencies : Track which jobs on which hosts depend on which variables

Monitor network connectivity : Regular checks prevent synchronization failures

Use local copies for reliability : Critical workflows should use local variables when possible

Test before deployment : Verify variable synchronization in test environment

Plan for network failures : Use non-critical conditions for remote variables when appropriate

Troubleshooting Checklist

When variable synchronization fails:

  • Variable is marked as exported
  • Network connectivity exists between hosts
  • Xi-Batch scheduler running on both hosts
  • Firewall permits Xi-Batch ports (10000-10001)
  • Hosts configured in /etc/Xibatch-hosts
  • No errors in scheduler logs
  • Variable name correct (including host prefix)
  • Recent scheduler restart if cache suspected stale

Verification After Changes

Test variable synchronization:

bash

# Host A: Update exported variable
btvar -s SYNC_TEST "Updated $(date)"

# Host B: Read variable
btvar -v hostA:SYNC_TEST

# Should show updated value with timestamp

If synchronization working, value propagates within seconds.

Diagnosing Why Ready Jobs Won't Start
Understanding conditions, load levels, and scheduling constraints that prevent job execution