DeepSky ATC: Stage 4 Production Training

March 20, 2026

Transition to 50-Agent Scale

Moved Stage 4 training to RunPod (RTX PRO 4500) for production-scale run.

Configuration:

Hit several protocol mismatches during scale-up.

Issue: __all__ key being stripped during dictionary filtering in BlueSkyEnv.step.

Fix: Explicitly ensure __all__ is preserved in both terminated and truncated dictionaries.

Added version tracking: VERSION = "2.0.3-KEY-ALIGN" with stderr printing to confirm correct code version on Ray workers.

Issue: Rollout workers hanging at 0 timesteps. Pre-installed PyTorch 2.6.0 on RunPod incompatible with binaries from older setup runs.

Fix: Force-reinstalled torch-scatter, torch-sparse for pt26cu124 (PyTorch 2.6.0 + CUDA 12.4).

Issue: AttributeError: module 'numpy' has no attribute 'product' triggered by Ray 2.30.0 internal code.

Fix: Downgraded NumPy to <2.0.0 (1.26.4) for RLlib preprocessor compatibility.

Issue: BlueSky ArgumentError: txt2hdg needs either a float... caused by NaN/Inf from policy or NumPy-specific float types.

Fix: Added np.isfinite checks to _apply_action and explicitly cast all BlueSky command arguments to primitive Python float().

Switched RLlib config to count_steps_by="agent_steps". Prevents massive memory buffering issues when scaling to 50 agents.

Set train_batch_size = 20000 (approximately 1 full episode of 50 agents) to ensure stable gradients and frequent WandB updates.

Initial 2M step run was equivalent to only 100 episodes (2M / 20,000 steps/ep). Low absolute performance expected.

New target: 100,000,000 agent steps (~5,000 episodes) for publication-quality coordination.

Increased train_batch_size to 100,000 (5 episodes per update) to stabilize gradients for high-density coordination maneuvers.