MARL Cooperative Navigation: Geometry Problem & The Abyss

January 21, 2026

Scaling from 2 agents to 5, then to 2,500.

Phase 3: Three Agents

Challenge: The “middle agent problem.” In dense configs, one agent has no clean path. Waiting costs time. Time costs reward.

Observed strategy: rugby scrum. Shove through traffic, accept collision penalties (-2.0), trigger completion bonus (+150) faster.

Entropy at 0.38 means overconfident in a bad strategy.

Tried forcing politeness (higher collision penalties, hold rewards). Made it worse.

Conclusion: Aggressive coordination is the optimal strategy. The math favors shoving.

Setup:

Result: Decisive win.

Emergent behavior: Constant rotation. Dynamic swapping. No static roles. Congestion managed through movement, not avoidance.

Milestone: Trained 5-agent policy ready for deployment.

Goal: Visualize thousands of agents without retraining.

Solution: abyss.py

Cause: Action type mismatch (string vs int in recorded frames).

Fix: Type guard before conversion.

Cause: server.running = False at start.

Fix: Set flag before thread start.

The “vibrating atom” effect. Agents twitching in place instead of moving smoothly.

Cause: Policy queried every frame. Near-equal probabilities caused action flip-flopping.

Fix: Action smoothing.

action_repeat = 8  # Query network every 8 frames, cache between

Result: Smooth, organic motion. Agents glide instead of twitch.

Unreal updates:

1,000 environments hit JSON parsing and actor limits. 500 is the sweet spot.

Phase	Agents	Return	Collisions	Time
1	2	326	0.6	~1h
3	3	400	8.0	~2h
4	5	509	35	~3h

Total training: ~6 hours on M4 MacBook.