Stratum: Phase 2 Baseline Implementation

March 28, 2026

Phase 2 Canonical Baseline Locked

Locked the Phase 2 canonical baseline in code:

24 agents
3 soft-affiliation blocs
10 Meridian zones
Canonical Phase 2 contracts in config and tests

Added versioned run metadata to SQLite and CLI plumbing for baseline_version, experiment family, condition ids, and inference metadata.

Situated Interaction Quality

Rebuilt conversations to be trigger-based, addressee-grounded, and written back into episodic memory.

Added latent economic-game scaffolding:

Delegation
Barter
Tribute transfer
Pending settlement tracking
Meridian institutional zones

World Heat Formalization

Formalized World Heat Ω into explicit normalized components. Exposed both summary and component series through world observations and replay API.

Upgraded replay surface into Phase 2 research instrument:

Run metadata strip
Ω evidence mode
Brokerage/delegation analysis
Bloc-aware graph rendering
Case-study dashboard support

Experiment Matrix Runner

Added experiment-matrix runner in main.py + experiments.py for phase1, phase1b, and rotated phase2 organizational model-stack bundles.

Validation: Full automated suite passes (95 tests).

Canonical Experiment Table

Clarified canonical Phase 2 roster uses same role structure in every organization.

Locked rule that mixed-tier conditions must use same tier-count distribution in every organization.

Model recommendation:

Primary capability ladder: Qwen3-8B / 14B / 32B
Family comparison anchors: Qwen3-8B, Llama-3.1-8B-Instruct, Ministral 3 8B
Frontier extension: Llama-3.3-70B-Instruct

Tick-length ladder:

25 smoke
50 calibration
100 canonical research runs
120-150 robustness only

Canonical seed policy:

Core capability seeds: 11, 23, 37, 41, 53, 67, 79, 97
Family comparison seeds: 11, 23, 37, 53, 79
Frontier seeds: 11, 37, 79

Canonical experiment table:

Capability core (C0-C6)
Family core (F0-F5)
Frontier extension (G0-G2)

Meridian Economy Design

Added dedicated Meridian economy design doc. Synced master research design with approved institutional economy decisions.

Key change: Canonical comparative runs now use 120 ticks.

Study separates deterministic no-shock baseline from deterministic shock-treatment suite. Treatment suite centered on two major shocks, not constant scripted disruption.

Canonical economic structure (hybrid institutional economy):

Bilateral bargaining everywhere
Posted offers and routine settlement in Market Square
Disputed-claim arbitration in Civic Hall
ledger credits as weak accounting layer rather than universal fiat

Enforcement hierarchy:

Private obligation
Provisional institutional claim
Confirmed ledger credit after bilateral confirmation or adjudication

Canonical scarcity regime:

Renewable water commons
Distributed but bounded food
Slow-flow medicine
Intermittent weapons / scraps
Institutionally generated information

Benchmark episode requirements:

Commons pressure
Trust delivery
Market bargaining
Arbitration

Institutional Baseline Features

Explicit collective sanctions: Agents can coordinate punishment.

Portable burden-bearing receipts: Debts can be tracked and transferred.

Normative arbitration in Civic Hall: Disputes resolved through institutional mechanism.

Action-costed, spatially fragile communication: Speech has cost, location matters.

Exile re-entry via sponsor + ruling + liability: Exiled agents need institutional path back.

Inference Backend

Created inference-backend branch for remote inference work.

Added provider-neutral inference runtime contract to CLI and run metadata.

Introduced generic InferenceClient supporting:

Local Ollama-compatible generation
OpenAI-compatible remote inference endpoints (RunPod-served open models)

Split canonical experiment model assignments from runtime deployment ids so experiment specs stay research-clean while runtime model names remain configurable.

Added bounded retry/backoff for transient remote failures with transport retry metadata logged.

Added --preflight backend validation so long runs fail fast on missing API keys, empty model maps, or bad endpoint configuration.

Validation: All inference client tests pass.

What Phase 2 Baseline Means

Repo now has research-first canonical baseline distinct from earlier POC cast/world.

Run metadata, replay payloads, and visualization contracts aligned around experiment matrix instead of ad hoc run shape.

CLI can enumerate and execute approved condition bundles without changing core simulation loop.

Visualization functions as both demo surface and evidence surface: spatial control, brokerage, delegation, Ω, and case-study inspection are all first-class payloads.