Stratum: Phase 2 Baseline Implementation
Phase 2 Canonical Baseline Locked
Locked the Phase 2 canonical baseline in code:
- 24 agents
- 3 soft-affiliation blocs
- 10 Meridian zones
- Canonical Phase 2 contracts in config and tests
Added versioned run metadata to SQLite and CLI plumbing for baseline_version, experiment family, condition ids, and inference metadata.
Situated Interaction Quality
Rebuilt conversations to be trigger-based, addressee-grounded, and written back into episodic memory.
Added latent economic-game scaffolding:
- Delegation
- Barter
- Tribute transfer
- Pending settlement tracking
- Meridian institutional zones
World Heat Formalization
Formalized World Heat Ω into explicit normalized components. Exposed both summary and component series through world observations and replay API.
Upgraded replay surface into Phase 2 research instrument:
- Run metadata strip
- Ω evidence mode
- Brokerage/delegation analysis
- Bloc-aware graph rendering
- Case-study dashboard support
Experiment Matrix Runner
Added experiment-matrix runner in main.py + experiments.py for phase1, phase1b, and rotated phase2 organizational model-stack bundles.
Validation: Full automated suite passes (95 tests).
Canonical Experiment Table
Clarified canonical Phase 2 roster uses same role structure in every organization.
Locked rule that mixed-tier conditions must use same tier-count distribution in every organization.
Model recommendation:
- Primary capability ladder:
Qwen3-8B / 14B / 32B - Family comparison anchors:
Qwen3-8B,Llama-3.1-8B-Instruct,Ministral 3 8B - Frontier extension:
Llama-3.3-70B-Instruct
Tick-length ladder:
- 25 smoke
- 50 calibration
- 100 canonical research runs
- 120-150 robustness only
Canonical seed policy:
- Core capability seeds: 11, 23, 37, 41, 53, 67, 79, 97
- Family comparison seeds: 11, 23, 37, 53, 79
- Frontier seeds: 11, 37, 79
Canonical experiment table:
- Capability core (C0-C6)
- Family core (F0-F5)
- Frontier extension (G0-G2)
Meridian Economy Design
Added dedicated Meridian economy design doc. Synced master research design with approved institutional economy decisions.
Key change: Canonical comparative runs now use 120 ticks.
Study separates deterministic no-shock baseline from deterministic shock-treatment suite. Treatment suite centered on two major shocks, not constant scripted disruption.
Canonical economic structure (hybrid institutional economy):
- Bilateral bargaining everywhere
- Posted offers and routine settlement in
Market Square - Disputed-claim arbitration in
Civic Hall ledger creditsas weak accounting layer rather than universal fiat
Enforcement hierarchy:
- Private obligation
- Provisional institutional claim
- Confirmed ledger credit after bilateral confirmation or adjudication
Canonical scarcity regime:
- Renewable
watercommons - Distributed but bounded
food - Slow-flow
medicine - Intermittent
weapons/scraps - Institutionally generated
information
Benchmark episode requirements:
- Commons pressure
- Trust delivery
- Market bargaining
- Arbitration
Institutional Baseline Features
Explicit collective sanctions: Agents can coordinate punishment.
Portable burden-bearing receipts: Debts can be tracked and transferred.
Normative arbitration in Civic Hall: Disputes resolved through institutional mechanism.
Action-costed, spatially fragile communication: Speech has cost, location matters.
Exile re-entry via sponsor + ruling + liability: Exiled agents need institutional path back.
Inference Backend
Created inference-backend branch for remote inference work.
Added provider-neutral inference runtime contract to CLI and run metadata.
Introduced generic InferenceClient supporting:
- Local Ollama-compatible generation
- OpenAI-compatible remote inference endpoints (RunPod-served open models)
Split canonical experiment model assignments from runtime deployment ids so experiment specs stay research-clean while runtime model names remain configurable.
Added bounded retry/backoff for transient remote failures with transport retry metadata logged.
Added --preflight backend validation so long runs fail fast on missing API keys, empty model maps, or bad endpoint configuration.
Validation: All inference client tests pass.
What Phase 2 Baseline Means
Repo now has research-first canonical baseline distinct from earlier POC cast/world.
Run metadata, replay payloads, and visualization contracts aligned around experiment matrix instead of ad hoc run shape.
CLI can enumerate and execute approved condition bundles without changing core simulation loop.
Visualization functions as both demo surface and evidence surface: spatial control, brokerage, delegation, Ω, and case-study inspection are all first-class payloads.