DeepSky ATC: MLP Baseline Refactor

March 22, 2026

The Problem: Naive Flatten Not Defensible

Original MLP architecture flattened all neighbors into one vector:

neighbors_flat = neighbors.reshape(batch_size, K*features)
h = mlp(neighbors_flat)

Issues:

  • Not permutation invariant (neighbor order affects output)
  • Cannot handle variable K
  • Not standard baseline in research literature
  • “Easy win” for GAT (unfair comparison)

The Fix: DeepSets Architecture

Switched to proper permutation-invariant baseline from Zaheer et al. 2017:

# Process each neighbor through shared MLP
for i in range(K):
    h_i = neighbor_mlp(neighbors[:, i, :])  # Shared weights

# Aggregate via mean pooling (permutation invariant)
h_neighbors = torch.mean(h_neighbors, dim=1)

Benefits:

  • Permutation invariant (neighbor order doesn’t matter)
  • Handles variable K naturally
  • Standard baseline (CommNet, DGN use similar approach)
  • Fair comparison to GAT (both use aggregation, only GAT has attention)

K=10 For All Architectures

Changed from:

  • MLP: K=5, naive flatten
  • GAT: K=10, attention

To:

  • MLP: K=10, DeepSets (mean pooling)
  • GAT: K=10, Graph Attention
  • GNN: K=10, Graph Convolution

Same K for all. Only architecture differs.

Research question changed from: “Can human-scale fixed attention (K=5 naive flatten) match attention-based models?”

To: “Does the attention mechanism provide benefits over permutation-invariant aggregation (DeepSets) when information is equal?”

Much stronger research.


Why K=10?

Graph Neural Network Standard:

  • K=8-12 is typical in GNN literature (Veličković et al. 2018, Kipf & Welling 2017)
  • Used in DGN, CommNet, and other MARL baselines

Fair Comparison:

  • Same information access for all architectures
  • Only difference is HOW neighbors are processed

Partial Observability Maintained:

  • Stage 3 (8 agents): K=10 sees 7/7 neighbors (100%)
  • Stage 4 (50 agents): K=10 sees 10/49 neighbors (20%)

MLP Stage 1-2 Results

Stage 1: Single Agent

  • MLP DeepSets: ~500 reward
  • GAT: ~500 reward
  • Equal performance

Stage 2: Two-Agent Conflict

  • MLP DeepSets: ~400 policy reward (60 iterations to converge)
  • GAT: ~400 policy reward (8 iterations to converge)
  • Equal peak performance, GAT 7.5x faster convergence

Critical Finding: Sample Efficiency Gap

MLP can match GAT peak performance in simple scenarios but requires significantly more training iterations.

Convergence speed:

  • Stage 2: GAT 7.5x faster
  • Both stable at 2-agent scale

Attention mechanism provides faster learning but not higher asymptotic performance at low agent counts.


Decision

Retrain MLP from scratch with DeepSets. Old K=5 naive flatten results invalid for publication.

Set K=10 for all architectures to ensure fair comparison. Either result is publishable because comparison is fair and methods are proper.