DeepSky ATC: MLP Baseline Refactor
The Problem: Naive Flatten Not Defensible
Original MLP architecture flattened all neighbors into one vector:
neighbors_flat = neighbors.reshape(batch_size, K*features)
h = mlp(neighbors_flat)
Issues:
- Not permutation invariant (neighbor order affects output)
- Cannot handle variable K
- Not standard baseline in research literature
- “Easy win” for GAT (unfair comparison)
The Fix: DeepSets Architecture
Switched to proper permutation-invariant baseline from Zaheer et al. 2017:
# Process each neighbor through shared MLP
for i in range(K):
h_i = neighbor_mlp(neighbors[:, i, :]) # Shared weights
# Aggregate via mean pooling (permutation invariant)
h_neighbors = torch.mean(h_neighbors, dim=1)
Benefits:
- Permutation invariant (neighbor order doesn’t matter)
- Handles variable K naturally
- Standard baseline (CommNet, DGN use similar approach)
- Fair comparison to GAT (both use aggregation, only GAT has attention)
K=10 For All Architectures
Changed from:
- MLP: K=5, naive flatten
- GAT: K=10, attention
To:
- MLP: K=10, DeepSets (mean pooling)
- GAT: K=10, Graph Attention
- GNN: K=10, Graph Convolution
Same K for all. Only architecture differs.
Research question changed from: “Can human-scale fixed attention (K=5 naive flatten) match attention-based models?”
To: “Does the attention mechanism provide benefits over permutation-invariant aggregation (DeepSets) when information is equal?”
Much stronger research.
Why K=10?
Graph Neural Network Standard:
- K=8-12 is typical in GNN literature (Veličković et al. 2018, Kipf & Welling 2017)
- Used in DGN, CommNet, and other MARL baselines
Fair Comparison:
- Same information access for all architectures
- Only difference is HOW neighbors are processed
Partial Observability Maintained:
- Stage 3 (8 agents): K=10 sees 7/7 neighbors (100%)
- Stage 4 (50 agents): K=10 sees 10/49 neighbors (20%)
MLP Stage 1-2 Results
Stage 1: Single Agent
- MLP DeepSets: ~500 reward
- GAT: ~500 reward
- Equal performance
Stage 2: Two-Agent Conflict
- MLP DeepSets: ~400 policy reward (60 iterations to converge)
- GAT: ~400 policy reward (8 iterations to converge)
- Equal peak performance, GAT 7.5x faster convergence
Critical Finding: Sample Efficiency Gap
MLP can match GAT peak performance in simple scenarios but requires significantly more training iterations.
Convergence speed:
- Stage 2: GAT 7.5x faster
- Both stable at 2-agent scale
Attention mechanism provides faster learning but not higher asymptotic performance at low agent counts.
Decision
Retrain MLP from scratch with DeepSets. Old K=5 naive flatten results invalid for publication.
Set K=10 for all architectures to ensure fair comparison. Either result is publishable because comparison is fair and methods are proper.