DeepSeek mHC: The Stream Persistence Bug
Part 1 of my DeepSeek mHC reproduction. The goal: validate Hyper-Connections instability at small scale before committing to expensive GPU time. See Part 2: Infrastructure and Part 3: Results at 1.7B.
Jan 10: The Stream Persistence Bug
Goal: Implement Hyper-Connections (HC) and Manifold Hyper-Connections (mHC) at small scale (~10M parameters) to validate DeepSeek’s claims before scaling up.
What went wrong: Initial results showed no difference between residual baseline, HC, and mHC. Everything looked identical. No instability, no amplification. A lot of nothing.
Debugging hell:
- Audited the architecture three times
- Verified math against the paper
- Checked shapes, initializations, projections
- Everything looked correct
The real bug: I was collapsing the multi-stream output back to a single stream at every layer, then re-expanding it again. The “hyper” architecture wasn’t persistent. Streams weren’t actually interacting across layers. I had unknowingly implemented a fancy residual wrapper.
The fix: Maintain stream persistence across layers. No collapse or re-expansion. I let the streams live.
Result:
- HC immediately showed 9.2× amplification
- mHC stayed pinned at 1.0
- The difference finally existed
Part 1 published.
Jan 11: Part 1 Goes Live
Blog post: “DeepSeek’s mHC: When Residual Connections Explode”
- Hits #3 on Hacker News
Public promise made: “Part 2 at ~1B scale later this week.”
Next: Infrastructure Hell