DeepSky ATC: MLP Baseline Refactor

March 22, 2026

The Problem: Naive Flatten Not Defensible

Original MLP architecture flattened all neighbors into one vector:

neighbors_flat = neighbors.reshape(batch_size, K*features)
h = mlp(neighbors_flat)

Issues:

Not permutation invariant (neighbor order affects output)
Cannot handle variable K
Not standard baseline in research literature
“Easy win” for GAT (unfair comparison)

The Fix: DeepSets Architecture

Switched to proper permutation-invariant baseline from Zaheer et al. 2017:

# Process each neighbor through shared MLP
for i in range(K):
    h_i = neighbor_mlp(neighbors[:, i, :])  # Shared weights

# Aggregate via mean pooling (permutation invariant)
h_neighbors = torch.mean(h_neighbors, dim=1)

Benefits:

Permutation invariant (neighbor order doesn’t matter)
Handles variable K naturally
Standard baseline (CommNet, DGN use similar approach)
Fair comparison to GAT (both use aggregation, only GAT has attention)

K=10 For All Architectures

Changed from:

MLP: K=5, naive flatten
GAT: K=10, attention

To:

MLP: K=10, DeepSets (mean pooling)
GAT: K=10, Graph Attention
GNN: K=10, Graph Convolution

Same K for all. Only architecture differs.

Research question changed from: “Can human-scale fixed attention (K=5 naive flatten) match attention-based models?”

To: “Does the attention mechanism provide benefits over permutation-invariant aggregation (DeepSets) when information is equal?”

Much stronger research.

Why K=10?

Graph Neural Network Standard:

K=8-12 is typical in GNN literature (Veličković et al. 2018, Kipf & Welling 2017)
Used in DGN, CommNet, and other MARL baselines

Fair Comparison:

Same information access for all architectures
Only difference is HOW neighbors are processed

Partial Observability Maintained:

Stage 3 (8 agents): K=10 sees 7/7 neighbors (100%)
Stage 4 (50 agents): K=10 sees 10/49 neighbors (20%)

MLP Stage 1-2 Results

Stage 1: Single Agent

MLP DeepSets: ~500 reward
GAT: ~500 reward
Equal performance

Stage 2: Two-Agent Conflict

MLP DeepSets: ~400 policy reward (60 iterations to converge)
GAT: ~400 policy reward (8 iterations to converge)
Equal peak performance, GAT 7.5x faster convergence

Critical Finding: Sample Efficiency Gap

MLP can match GAT peak performance in simple scenarios but requires significantly more training iterations.

Convergence speed:

Stage 2: GAT 7.5x faster
Both stable at 2-agent scale

Attention mechanism provides faster learning but not higher asymptotic performance at low agent counts.

Decision

Retrain MLP from scratch with DeepSets. Old K=5 naive flatten results invalid for publication.

Set K=10 for all architectures to ensure fair comparison. Either result is publishable because comparison is fair and methods are proper.