Deep Dives

Technical deep dives into specific components of the system.

Deep Dive

The Card-Choice Head: the Fourth Head on the Trunk
2026-04-19

The Neural Network deep dive named three heads. This post documents the fourth — a card-reward picker that scores offered cards by projecting the policy hidden state into card-embedding space, samples without replacement, and carries no PPO gradient path.
Deep Dive

Slay the Spire 2 — What the Agent Is Actually Playing
2026-04-19

Before I can show you how the agent plays this game, I have to show you the game. Here’s the 5-minute version — the mechanics, the intent-telegraphing, and the three structural properties that make this a harder RL problem than it looks.
Deep Dive

Reinforcement Learning, Grounded In One Project
2026-04-19

Six words do the conceptual work. Four files do the actual work. Here they are, side by side — a tour of model.py, reward.py, actor_proxy.py, and learner.py, with the PPO update cycle as the mid-article payoff.
Deep Dive

The Neural Network: Five Embeddings, One Trunk, Three Heads
2026-04-19

A ~234K-parameter policy + value network that learns to play Slay the Spire 2 — told in one flow diagram and three cards. The trunk is smaller than the input, and almost all of the representation work happens in the embeddings.
Deep Dive

Input Vector — What the Network Actually Sees
2026-04-19

A dimensional tour of the 583-dim tensor fed to the policy network, grounded in src/bot/model.py. What each float is, where it comes from in game state, and why it’s shaped that way.
Deep Dive

Distributed Compute — Eleven Laptops, One TCP Port, and a Scheduled Task
2026-04-19

Eleven Windows laptops run an actor proxy each. One PPO learner runs on my desk. They talk over a single TCP port on the home LAN. This post is what the compute actually looks like today — numbers, configs, and the wire.