Input Vector — What the Network Actually Sees
A dimensional tour of the 583-dim tensor fed to the policy network,
grounded in src/bot/model.py. What each float is, where it
comes from in game state, and why it's shaped that way.
Shape ledger
| Field group | Shape | Dtype | → Trunk |
|---|---|---|---|
| Scalars | [1, 18] | float32 | 18 |
| Hand cards | [1, 10] × 12 + [1, 10, 16] | long + float32 | 270 |
| Enemies | [1, 6] × 17 + [1, 6, 16] + [1, 6, 15, 9] | long + float32 | 252 |
| Piles (draw/discard/exhaust) | [1, 30] × 3 + masks | long + float32 | 24 |
| Relics | [1, 20] × 4 | long + float32 | 10 |
| Player powers | [1, 15] × 3 | long + float32 | 9 |
| Action mask | [1, 61] | bool | — |
Seven tensor groups land in EncodedState. Six of them concat
into the 583-dim block that feeds the MLP trunk. The seventh,
action_mask, is routed separately — it's applied after the
policy head to zero out illegal logits via
logits.masked_fill(~enc.action_mask, -1e9). The trunk never
sees it.
The constants that shape these tensors live at the top of
src/bot/model.py: MAX_HAND = 10,
MAX_TARGETS = 6, MAX_POWERS = 15,
MAX_RELICS = 20, and
MAX_DRAW = MAX_DISCARD = MAX_EXHAUST = 30. Embedding widths:
CARD_EMB_DIM = 16, ENEMY_EMB_DIM = 16,
RELIC_EMB_DIM = 8, PILE_EMB_DIM = 8,
POWER_EMB_DIM = 8. The three entry points to read alongside
this article are EncodedState (line 83),
encode_state (line 189), and
StsPolicy._trunk_forward (line 445).
Each section below takes one group, puts the raw game state on the left and the encoded tensor slice on the right, and walks the formulas in between. The retrospective — what one-hot trap I fell into, where the normalization caps were wildly too low — lives over in the lessons-learned post. This one's the tour.
Player scalars (18)
state["player"] = {
"hp": 55, "max_hp": 80,
"block": 6,
"energy": 2, "max_energy": 3,
"strength": 1, "dexterity": 0,
"vulnerable": 0, "weak": 0,
"frail": 0, "intangible": 0,
"thorns": 0,
"attacks_played": 2,
"skills_played": 1,
"powers_played": 0
} [0]→ hp/max_hp[1]→ block/max_hp[2]→ energy/max_energy[3–5]→ pile fractions /30[6]→ turn/20[7]→ len(hand)/10[8]→ _norm(strength, -300, 300)[9]→ _norm(dex, -50, 50)[10]→ _norm(vuln, 0, 300)[11]→ _norm(weak, 0, 50)[12]→ _norm(frail, 0, 100)[13]→ _norm(intang, 0, 20)[14]→ _norm(thorns, 0, 50)[15–17]→ _norm(counters)
The 18 numbers aren't a flat block — they split into three semantic
sub-groups. The first eight are game-state fractions: hp, block, and
energy as fractions of their max, pile sizes divided by 30, the turn
number clipped at 20 and divided by 20, and the hand size divided by
MAX_HAND. The next seven are player buffs and debuffs run
through _norm. The last three are action counters — how
many attacks, skills, and powers I've played this turn, each
normalized by a realistic per-turn cap.
The _norm function is three lines:
min(max(val, lo), hi) / hi. It clamps the value into
[lo, hi] and divides by hi. For
_norm(strength, -300, 300), that returns
[-1, 1]; for _norm(vulnerable, 0, 300), that
returns [0, 1]. Strength and dexterity are the only
two-sided signed scalars in the block. This convention isn't documented
anywhere except the _norm docstring ("or
[-1, 1] if lo < 0"). The sign of each
scalar is reason-from-the-lo-argument knowledge, not a
flag in the tensor metadata.
The scalar block is the only part of the input vector that doesn't pass through an embedding layer. These are plain normalized floats, concatenated verbatim into the 583-dim trunk input.
Hand (270)
{
"id": "Strike_R",
"cost": 1,
"upgraded": false,
"is_attack": true,
"is_skill": false,
"is_power": false,
"x_cost": false,
"exhaust": false,
"ethereal": false,
"innate": false,
"retain": false,
"sly": false
} card_ids[0, i][1, 10] → card_vocab.index("Strike_R")card_costs[0, i][1, 10] → cost / 3card_upgraded[0, i][1, 10] → flagcard_type_{attack,skill,power}[1, 10] × 3card_x_cost, card_exhaust, card_ethereal,card_innate, card_retain, card_sly[1, 10] × 6card_mask[0, i] = 1[1, 10] → zero for padding
Ten hand slots, each carrying a 16-dim learned embedding plus 11
hand-crafted scalar extras. 10 × (16 + 11) = 270. This is
the biggest contributor to the input.
The 11 extras aren't a keyword bitmap. Cost, upgraded, three type
flags (attack/skill/power), an x_cost flag, and five
keyword flags (exhaust, ethereal, innate, retain, sly) each get their
own tensor. The network reads each flag as its own gradient channel.
The alternative — packing the 11 flags into a single integer and
looking up an embedding — would be cheaper but would lose the
per-flag gradient signal.
In the forward pass (lines 449–469 of _trunk_forward),
the embedding lookup happens first:
card_emb = self.card_emb(enc.card_ids) produces
[B, 10, 16]. The mask zeros padding slots before the
reshape: card_emb = card_emb * enc.card_mask.unsqueeze(-1).
Then torch.stack(..., dim=-1) packs the 11 extras into
[B, 10, 11], the mask multiply zeros padding again,
torch.cat joins the extras to the embedding at
[B, 10, 27], and a final .reshape(B, -1)
flattens to [B, 270].
The mask multiply is load-bearing for a subtle reason. Padding slots
have card_ids[i] = 0, which is the same index as
<unk> in the vocabulary — the "unseen card" token.
Without the mask zeroing, the network would read "empty slot 7" and
"I just drew an unseen card" as the same input. With the mask,
padding contributes exactly zero to the flat vector.
Enemies (252)
{
"id": "AcidSlime_L",
"hp": 48, "max_hp": 65, "block": 6,
"intent_damage": 11, "intent_hits": 1,
"intends_attack": true,
"intends_buff": false,
"intends_debuff": false,
"intends_block": false,
"vulnerable": 0, "weak": 0, "poison": 3,
"strength": 0, "artifact": 0,
"intangible": 0, "thorns": 0, "ritual": 0,
"powers": [
{"id": "StrengthUp", "amount": 2}
]
} enemy_ids→ enemy_vocab.index(...)enemy_hp_frac→ 48/65 = 0.738enemy_max_hp_norm→ min(65, 1000)/1000 = 0.065enemy_block_frac→ 6/65 = 0.092enemy_intent_dmg→ min(11, 50)/50 = 0.22enemy_intent_hits→ min(1, 10)/10 = 0.1enemy_intends_{attack,buff,debuff,block}→ 4 flagsenemy_vulnerable/weak/poison→ normalizedenemy_strength/artifact/intangible/thorns/ritual→ normalizedenemy_power_ids[i, j] = power_vocab.index("StrengthUp")[1, 6, 15]enemy_power_amounts[i, j] = 2/50[1, 6, 15]enemy_power_mask[i, j] = 1for real, 0 for pad
6 × (16 + 17 + 9) = 252. Six enemy slots, each with a
16-dim embedding, 17 scalar extras, and a 9-dim pooled power block.
Second-biggest contributor after hand.
The 17 scalars aren't a flat list — they split into three sub-groups.
State (hp_frac, max_hp_norm, block_frac) — the enemy's current
defensive posture. Intent (intent_dmg, intent_hits, and four intent
flags: attacks / buffs / debuffs / blocks) — what the enemy is about
to do. Buffs/debuffs (vulnerable, weak, poison, strength, artifact,
intangible, thorns, ritual) — what conditions it has stacked. The
four intent flags live inside those 17, not on top. All 17 get
stacked via torch.stack(..., dim=-1) into
[B, 6, 17].
The per-enemy power pool is where the architecture earns its keep.
MAX_POWERS = 15 per enemy. Encoded naively as 15 × 9 =
135 scalars per enemy, times six enemies, that's 810 dims just for
enemy powers. Instead, the forward pass reshapes
enemy_power_ids to [B*6, 15] for a batched
self.power_emb lookup, appends the normalized amount as
a trailing dim, mean-pools across the 15 slots with
mask.sum(-1).clamp(min=1) as the divisor, and reshapes
back to [B, 6, 9]. About a fifteenth of the naive cost
(9 / 135), same "what's active on this enemy" signal.
enemy_intent_dmg clips at 50 —
min(v, 50) / 50. That cap is the one in this group most
likely to be on the edge of plausible late-game values; elite attacks
in the 30-40 range are common, a 50+ attack isn't unheard of. The
other caps (max_hp / 1000,
vulnerable / 300, poison / 200,
strength / 300) are loose enough that late-game values
stay inside the range.
Piles (24)
state["draw_pile"] = ["Strike_R", "Strike_R", "Defend_R", ...] state["discard_pile"] = ["Bash", ...] state["exhaust_pile"] = []
draw_ids[0, :k] = vocab.index(card), pad 0draw_mask[0, :k] = 1, else 0self.pile_emb(draw_ids) * draw_mask.unsqueeze(-1).sum(dim=1) / mask.sum(-1).clamp(min=1)- → [B, 8] per pile
Draw, discard, and exhaust. Each pile is an unordered bag of up to 30
card IDs, pooled through self.pile_emb to 8 dims.
8 × 3 = 24.
self.pile_emb is a separate
nn.Embedding(card_vocab_size, PILE_EMB_DIM) from
self.card_emb. Same vocabulary index, independent weight
tables. A Strike in the hand and a Strike in the draw pile look up
different rows — one for "I can play this right now," one for "this
is in the bag I'll draw from." Sharing the table would force a single
embedding to carry both meanings; splitting it lets each table
specialize.
The inner _pool_pile function is four lines: multiply
the embedding by the mask, sum across the pile dim, divide by the
clamped mask sum. The mask.sum(-1).clamp(min=1) is the
empty-pile guard. Without clamp, an empty exhaust pile —
common on turn one — divides zero by zero and NaNs the forward pass.
With clamp(min=1), the denominator is 1 but the
numerator is still zero (every embedding was masked out), so the
pool vector comes out as all zeros. That's the right default for
"nothing here."
Relics (10)
state["relics"] = [
{"id": "BurningBlood", "counter": 0, "active": false},
{"id": "Lantern", "counter": 1, "active": true},
...
] relic_ids[1, 20] longrelic_counters[1, 20] = min(counter, 10) / 10relic_active[1, 20]→ flagrelic_mask[1, 20]- Mean-pooled → [1, 10] = 8-dim emb + counter + active
Up to 20 relic slots mean-pooled into a single 10-dim summary:
RELIC_EMB_DIM(8) + 1 counter + 1 active = 10. The
embedding and the two scalars are concatenated per-slot, mask-zeroed,
summed, and divided by the clamped mask count — the same
clamp(min=1) guard as the piles.
The pool-instead-of-positional choice is intentional. Relics are mostly passive modifiers — Bronze Scales reflects damage, Lantern gives free energy turn 1, Ink Bottle triggers every ten cards played. The policy network doesn't need to know "relic in slot 3 vs. slot 4"; it needs "is this relic-y effect active?" Mean-pooling collapses positional information the network would have to learn to ignore. The pile pools (30 slots → 8 dims each, 1:30) and this relic pool (20 slots → 10 dims, 1:20) are the two most aggressive shrinks in the vector.
relic_counters = min(counter, 10) / 10 caps at 10. That
works for the common counter-relics — Lantern, Snecko Eye, Ink
Bottle charge counters. Rare late-game relics that accrue >10
stacks would clip to 1.0, losing discrimination. In practice this
hasn't shown up as a training signal; flagged here for anyone
building a similar encoder on a different relic pool.
Player powers (9)
player["powers"] = [
{"id": "Ritual", "amount": 3},
{"id": "StrengthUp", "amount": 2},
...
] player_power_ids[1, 15] longplayer_power_amounts[1, 15]→ clamped [-50, 50] / 50player_power_mask[1, 15]- Mean-pooled → [1, 9] = 8-dim emb + 1 amount
Fifteen generic power slots mean-pooled to
POWER_EMB_DIM(8) + 1 = 9. Smallest group in the concat.
self.power_emb is shared between the player-power pool
and the per-enemy power pools from the Enemies section. One
nn.Embedding(power_vocab_size, 8) table, two call sites.
A Ritual power on the player and a Ritual power on an enemy lookup
the same embedding row — shared semantics for shared mechanics,
without doubling the embedding weights.
The amount scalar is clipped symmetrically:
min(max(amount, -50), 50) / 50, in
_encode_powers (line 167). Strength-like powers can be
negative (strength debuff reduces attack damage); the symmetric clip
preserves that sign. The /50 divisor is chosen the same
way other debuff caps are chosen — realistic maxes from gameplay,
not defensive caps from game source.
Where it all lands
Hand and enemies: 522 of 583 dims. The policy spends most of its input budget on what's in front of it right now.
x = torch.cat([enc.scalars, card_flat, enemy_flat,
draw_pool, disc_pool, exh_pool,
relic_pool, pp_pool], dim=-1)
# x.shape == [B, 583]
That [B, 583] feeds
Linear(583, 256) → LayerNorm → ReLU → Linear(256, 256) → LayerNorm → ReLU,
then the heads: policy_head: Linear(256, 61) for action
logits, value_head: Linear(256, 1) for V(s),
discard_head: Linear(256, 10) for mid-turn discard
priority, and choice_proj: Linear(256, 16) — the
card-choice head, trained outside the PPO loss and covered in
the Card-Choice Head
deep dive. Illegal actions get masked to -1e9 before the
softmax. 61 actions = 1 end_turn + 10 × 6
play(card, target).
The trunk is a plain MLP. No attention, no transformer. The 583 → 256 compression in the first linear layer is the only bottleneck between raw state and hidden, which is why the input budget matters: every dimension spent on padding or redundant encoding is headroom the policy and value heads don't get.
action_index_to_dict (line 353) inverts the flat action
index back into
{"type": "play", "card_idx": c, "target_idx": t}
so the bridge can dispatch it to the C# game. The input dim budget
and the action budget are engineered together:
MAX_HAND × MAX_TARGETS + 1 = 61, and
self.policy_head = Linear(256, 61). Change one constant
and both the concat shape and the head width move with it.