What I Learned

Lessons from building a reinforcement learning system for a complex card game.

What I Learned

A Second Head That Silently Corrupted PPO’s Ratio Math
2026-04-19

I added a second decision-maker to the network and wired it into the training math the wrong way. Nothing blew up — a training metric just quietly drifted for weeks before I spotted the pattern. The fix was one line of code and a comment explaining why past-me got it wrong.
What I Learned

Reward Shaping: Every Iteration Got Gamed
2026-04-18

The reward function is the single most important design decision in an RL system. Every version I wrote was exploited by the agent in some surprising way, and I had to iterate three times before landing on a reward that produced the behavior I actually wanted.
What I Learned

Curriculum Learning: You Can’t Skip the Basics
2026-04-18

I spent weeks trying to train an agent on the full game before giving up and building a curriculum. The curriculum version reached higher performance on the full game in days than my direct approach reached in weeks.
What I Learned

Distributed Deployment on Windows — Death by a Thousand Papercuts
2026-04-19

Windows is a hostile environment for distributed training, and every approach I tried had a non-obvious failure mode. The thing that eventually worked is weirder than the thing I’d have picked if I started over.
What I Learned

Input Vector Design — One-Hot Was a Trap
2026-04-19

How the input vector got from ~1500 dimensions to 583, and how I found the second trap by accident — not by design.
What I Learned

Monitoring: “Online” Is Not the Same as “Working”
2026-04-19

I built a training dashboard early and thought I was monitoring my fleet. I was not. The dashboard happily reported 11 workers “Online” while 6 of them were dead for 47 minutes. Monitoring that doesn’t reflect reality is worse than no monitoring — it gives you false confidence.