DeepSky ATC: Stage 4 Production Training
Transition to 50-Agent Scale
Moved Stage 4 training to RunPod (RTX PRO 4500) for production-scale run.
Configuration:
- Architecture: Graph Attention Transformer (GAT)
- Scenario: Random (50 agents)
- Workers: 24 parallel simulation workers
- Checkpoint: Resuming from Stage 3 (8 agents)
Multi-Agent Protocol Debugging
Hit several protocol mismatches during scale-up.
Bug 1: all Key Stripped
Issue: __all__ key being stripped during dictionary filtering in BlueSkyEnv.step.
Fix: Explicitly ensure __all__ is preserved in both terminated and truncated dictionaries.
Added version tracking: VERSION = "2.0.3-KEY-ALIGN" with stderr printing to confirm correct code version on Ray workers.
Bug 2: Dependency ABI Mismatch
Issue: Rollout workers hanging at 0 timesteps. Pre-installed PyTorch 2.6.0 on RunPod incompatible with binaries from older setup runs.
Fix: Force-reinstalled torch-scatter, torch-sparse for pt26cu124 (PyTorch 2.6.0 + CUDA 12.4).
Bug 3: NumPy 2.0 Incompatibility
Issue: AttributeError: module 'numpy' has no attribute 'product' triggered by Ray 2.30.0 internal code.
Fix: Downgraded NumPy to <2.0.0 (1.26.4) for RLlib preprocessor compatibility.
Bug 4: Non-Finite Policy Outputs
Issue: BlueSky ArgumentError: txt2hdg needs either a float... caused by NaN/Inf from policy or NumPy-specific float types.
Fix: Added np.isfinite checks to _apply_action and explicitly cast all BlueSky command arguments to primitive Python float().
Stage 4 Scaling Optimization
Switched RLlib config to count_steps_by="agent_steps". Prevents massive memory buffering issues when scaling to 50 agents.
Set train_batch_size = 20000 (approximately 1 full episode of 50 agents) to ensure stable gradients and frequent WandB updates.
Scaling Pivot
Initial 2M step run was equivalent to only 100 episodes (2M / 20,000 steps/ep). Low absolute performance expected.
New target: 100,000,000 agent steps (~5,000 episodes) for publication-quality coordination.
Increased train_batch_size to 100,000 (5 episodes per update) to stabilize gradients for high-density coordination maneuvers.