Deep Dives
Technical deep dives into specific components of the system.
- Deep DiveThe Card-Choice Head: the Fourth Head on the Trunk2026-04-19
The Neural Network deep dive named three heads. This post documents the fourth — a card-reward picker that scores offered cards by projecting the policy hidden state into card-embedding space, samples without replacement, and carries no PPO gradient path.
- Deep DiveSlay the Spire 2 — What the Agent Is Actually Playing2026-04-19
Before I can show you how the agent plays this game, I have to show you the game. Here’s the 5-minute version — the mechanics, the intent-telegraphing, and the three structural properties that make this a harder RL problem than it looks.
- Deep DiveReinforcement Learning, Grounded In One Project2026-04-19
Six words do the conceptual work. Four files do the actual work. Here they are, side by side — a tour of model.py, reward.py, actor_proxy.py, and learner.py, with the PPO update cycle as the mid-article payoff.
- Deep DiveThe Neural Network: Five Embeddings, One Trunk, Three Heads2026-04-19
A ~234K-parameter policy + value network that learns to play Slay the Spire 2 — told in one flow diagram and three cards. The trunk is smaller than the input, and almost all of the representation work happens in the embeddings.
- Deep DiveInput Vector — What the Network Actually Sees2026-04-19
A dimensional tour of the 583-dim tensor fed to the policy network, grounded in src/bot/model.py. What each float is, where it comes from in game state, and why it’s shaped that way.
- Deep DiveDistributed Compute — Eleven Laptops, One TCP Port, and a Scheduled Task2026-04-19
Eleven Windows laptops run an actor proxy each. One PPO learner runs on my desk. They talk over a single TCP port on the home LAN. This post is what the compute actually looks like today — numbers, configs, and the wire.