DeepSky ATC: Stage 4 Production Training

March 20, 2026

Transition to 50-Agent Scale

Moved Stage 4 training to RunPod (RTX PRO 4500) for production-scale run.

Configuration:

  • Architecture: Graph Attention Transformer (GAT)
  • Scenario: Random (50 agents)
  • Workers: 24 parallel simulation workers
  • Checkpoint: Resuming from Stage 3 (8 agents)

Multi-Agent Protocol Debugging

Hit several protocol mismatches during scale-up.

Bug 1: all Key Stripped

Issue: __all__ key being stripped during dictionary filtering in BlueSkyEnv.step.

Fix: Explicitly ensure __all__ is preserved in both terminated and truncated dictionaries.

Added version tracking: VERSION = "2.0.3-KEY-ALIGN" with stderr printing to confirm correct code version on Ray workers.

Bug 2: Dependency ABI Mismatch

Issue: Rollout workers hanging at 0 timesteps. Pre-installed PyTorch 2.6.0 on RunPod incompatible with binaries from older setup runs.

Fix: Force-reinstalled torch-scatter, torch-sparse for pt26cu124 (PyTorch 2.6.0 + CUDA 12.4).

Bug 3: NumPy 2.0 Incompatibility

Issue: AttributeError: module 'numpy' has no attribute 'product' triggered by Ray 2.30.0 internal code.

Fix: Downgraded NumPy to <2.0.0 (1.26.4) for RLlib preprocessor compatibility.

Bug 4: Non-Finite Policy Outputs

Issue: BlueSky ArgumentError: txt2hdg needs either a float... caused by NaN/Inf from policy or NumPy-specific float types.

Fix: Added np.isfinite checks to _apply_action and explicitly cast all BlueSky command arguments to primitive Python float().


Stage 4 Scaling Optimization

Switched RLlib config to count_steps_by="agent_steps". Prevents massive memory buffering issues when scaling to 50 agents.

Set train_batch_size = 20000 (approximately 1 full episode of 50 agents) to ensure stable gradients and frequent WandB updates.


Scaling Pivot

Initial 2M step run was equivalent to only 100 episodes (2M / 20,000 steps/ep). Low absolute performance expected.

New target: 100,000,000 agent steps (~5,000 episodes) for publication-quality coordination.

Increased train_batch_size to 100,000 (5 episodes per update) to stabilize gradients for high-density coordination maneuvers.