Input Vector — What the Network Actually Sees

Deep Dive · 2026-04-19 · 9 min read

A dimensional tour of the 583-dim tensor fed to the policy network, grounded in src/bot/model.py. What each float is, where it comes from in game state, and why it's shaped that way.

Shape ledger

Field group	Shape	Dtype	→ Trunk
Scalars	[1, 18]	float32	18
Hand cards	[1, 10] × 12 + [1, 10, 16]	long + float32	270
Enemies	[1, 6] × 17 + [1, 6, 16] + [1, 6, 15, 9]	long + float32	252
Piles (draw/discard/exhaust)	[1, 30] × 3 + masks	long + float32	24
Relics	[1, 20] × 4	long + float32	10
Player powers	[1, 15] × 3	long + float32	9
Action mask	[1, 61]	bool	—

Seven tensor groups land in EncodedState. Six of them concat into the 583-dim block that feeds the MLP trunk. The seventh, action_mask, is routed separately — it's applied after the policy head to zero out illegal logits via logits.masked_fill(~enc.action_mask, -1e9). The trunk never sees it.

The constants that shape these tensors live at the top of src/bot/model.py: MAX_HAND = 10, MAX_TARGETS = 6, MAX_POWERS = 15, MAX_RELICS = 20, and MAX_DRAW = MAX_DISCARD = MAX_EXHAUST = 30. Embedding widths: CARD_EMB_DIM = 16, ENEMY_EMB_DIM = 16, RELIC_EMB_DIM = 8, PILE_EMB_DIM = 8, POWER_EMB_DIM = 8. The three entry points to read alongside this article are EncodedState (line 83), encode_state (line 189), and StsPolicy._trunk_forward (line 445).

Each section below takes one group, puts the raw game state on the left and the encoded tensor slice on the right, and walks the formulas in between. The retrospective — what one-hot trap I fell into, where the normalization caps were wildly too low — lives over in the lessons-learned post. This one's the tour.

Player scalars (18)

Player state → scalars

Game state fragment

state["player"] = {
  "hp": 55, "max_hp": 80,
  "block": 6,
  "energy": 2, "max_energy": 3,
  "strength": 1, "dexterity": 0,
  "vulnerable": 0, "weak": 0,
  "frail": 0, "intangible": 0,
  "thorns": 0,
  "attacks_played": 2,
  "skills_played": 1,
  "powers_played": 0
}

scalars[1, 18] float32

[0] → hp/max_hp
[1] → block/max_hp
[2] → energy/max_energy
[3–5] → pile fractions /30
[6] → turn/20
[7] → len(hand)/10
[8] → _norm(strength, -300, 300)
[9] → _norm(dex, -50, 50)
[10] → _norm(vuln, 0, 300)
[11] → _norm(weak, 0, 50)
[12] → _norm(frail, 0, 100)
[13] → _norm(intang, 0, 20)
[14] → _norm(thorns, 0, 50)
[15–17] → _norm(counters)

The 18 numbers aren't a flat block — they split into three semantic sub-groups. The first eight are game-state fractions: hp, block, and energy as fractions of their max, pile sizes divided by 30, the turn number clipped at 20 and divided by 20, and the hand size divided by MAX_HAND. The next seven are player buffs and debuffs run through _norm. The last three are action counters — how many attacks, skills, and powers I've played this turn, each normalized by a realistic per-turn cap.

The _norm function is three lines: min(max(val, lo), hi) / hi. It clamps the value into [lo, hi] and divides by hi. For _norm(strength, -300, 300), that returns [-1, 1]; for _norm(vulnerable, 0, 300), that returns [0, 1]. Strength and dexterity are the only two-sided signed scalars in the block. This convention isn't documented anywhere except the _norm docstring ("or [-1, 1] if lo < 0"). The sign of each scalar is reason-from-the-lo-argument knowledge, not a flag in the tensor metadata.

The scalar block is the only part of the input vector that doesn't pass through an embedding layer. These are plain normalized floats, concatenated verbatim into the 583-dim trunk input.

Hand (270)

Hand slot → encoded tensor

Game state (one slot)

{
  "id": "Strike_R",
  "cost": 1,
  "upgraded": false,
  "is_attack": true,
  "is_skill": false,
  "is_power": false,
  "x_cost": false,
  "exhaust": false,
  "ethereal": false,
  "innate": false,
  "retain": false,
  "sly": false
}

EncodedState fields

card_ids[0, i] [1, 10] → card_vocab.index("Strike_R")
card_costs[0, i] [1, 10] → cost / 3
card_upgraded[0, i] [1, 10] → flag
card_type_{attack,skill,power} [1, 10] × 3
card_x_cost, card_exhaust, card_ethereal,
card_innate, card_retain, card_sly [1, 10] × 6
card_mask[0, i] = 1 [1, 10] → zero for padding

Ten hand slots, each carrying a 16-dim learned embedding plus 11 hand-crafted scalar extras. 10 × (16 + 11) = 270. This is the biggest contributor to the input.

The 11 extras aren't a keyword bitmap. Cost, upgraded, three type flags (attack/skill/power), an x_cost flag, and five keyword flags (exhaust, ethereal, innate, retain, sly) each get their own tensor. The network reads each flag as its own gradient channel. The alternative — packing the 11 flags into a single integer and looking up an embedding — would be cheaper but would lose the per-flag gradient signal.

In the forward pass (lines 449–469 of _trunk_forward), the embedding lookup happens first: card_emb = self.card_emb(enc.card_ids) produces [B, 10, 16]. The mask zeros padding slots before the reshape: card_emb = card_emb * enc.card_mask.unsqueeze(-1). Then torch.stack(..., dim=-1) packs the 11 extras into [B, 10, 11], the mask multiply zeros padding again, torch.cat joins the extras to the embedding at [B, 10, 27], and a final .reshape(B, -1) flattens to [B, 270].

The mask multiply is load-bearing for a subtle reason. Padding slots have card_ids[i] = 0, which is the same index as <unk> in the vocabulary — the "unseen card" token. Without the mask zeroing, the network would read "empty slot 7" and "I just drew an unseen card" as the same input. With the mask, padding contributes exactly zero to the flat vector.

Enemies (252)

Enemy → encoded tensor

Game state (one enemy)

{
  "id": "AcidSlime_L",
  "hp": 48, "max_hp": 65, "block": 6,
  "intent_damage": 11, "intent_hits": 1,
  "intends_attack": true,
  "intends_buff": false,
  "intends_debuff": false,
  "intends_block": false,
  "vulnerable": 0, "weak": 0, "poison": 3,
  "strength": 0, "artifact": 0,
  "intangible": 0, "thorns": 0, "ritual": 0,
  "powers": [
    {"id": "StrengthUp", "amount": 2}
  ]
}

EncodedState fields (17 scalars + 9 power pool)

enemy_ids → enemy_vocab.index(...)
enemy_hp_frac → 48/65 = 0.738
enemy_max_hp_norm → min(65, 1000)/1000 = 0.065
enemy_block_frac → 6/65 = 0.092
enemy_intent_dmg → min(11, 50)/50 = 0.22
enemy_intent_hits → min(1, 10)/10 = 0.1
enemy_intends_{attack,buff,debuff,block} → 4 flags
enemy_vulnerable/weak/poison → normalized
enemy_strength/artifact/intangible/thorns/ritual → normalized
enemy_power_ids[i, j] = power_vocab.index("StrengthUp") [1, 6, 15]
enemy_power_amounts[i, j] = 2/50 [1, 6, 15]
enemy_power_mask[i, j] = 1 for real, 0 for pad

6 × (16 + 17 + 9) = 252. Six enemy slots, each with a 16-dim embedding, 17 scalar extras, and a 9-dim pooled power block. Second-biggest contributor after hand.

The 17 scalars aren't a flat list — they split into three sub-groups. State (hp_frac, max_hp_norm, block_frac) — the enemy's current defensive posture. Intent (intent_dmg, intent_hits, and four intent flags: attacks / buffs / debuffs / blocks) — what the enemy is about to do. Buffs/debuffs (vulnerable, weak, poison, strength, artifact, intangible, thorns, ritual) — what conditions it has stacked. The four intent flags live inside those 17, not on top. All 17 get stacked via torch.stack(..., dim=-1) into [B, 6, 17].

The per-enemy power pool is where the architecture earns its keep. MAX_POWERS = 15 per enemy. Encoded naively as 15 × 9 = 135 scalars per enemy, times six enemies, that's 810 dims just for enemy powers. Instead, the forward pass reshapes enemy_power_ids to [B*6, 15] for a batched self.power_emb lookup, appends the normalized amount as a trailing dim, mean-pools across the 15 slots with mask.sum(-1).clamp(min=1) as the divisor, and reshapes back to [B, 6, 9]. About a fifteenth of the naive cost (9 / 135), same "what's active on this enemy" signal.

enemy_intent_dmg clips at 50 — min(v, 50) / 50. That cap is the one in this group most likely to be on the edge of plausible late-game values; elite attacks in the 30-40 range are common, a 50+ attack isn't unheard of. The other caps (max_hp / 1000, vulnerable / 300, poison / 200, strength / 300) are loose enough that late-game values stay inside the range.

Piles (24)

Pile → pooled embedding

Game state

state["draw_pile"]    = ["Strike_R", "Strike_R", "Defend_R", ...]
state["discard_pile"] = ["Bash", ...]
state["exhaust_pile"] = []

EncodedState + _pool_pile

draw_ids[0, :k] = vocab.index(card), pad 0
draw_mask[0, :k] = 1, else 0
self.pile_emb(draw_ids) * draw_mask.unsqueeze(-1)
.sum(dim=1) / mask.sum(-1).clamp(min=1)
→ [B, 8] per pile

Draw, discard, and exhaust. Each pile is an unordered bag of up to 30 card IDs, pooled through self.pile_emb to 8 dims. 8 × 3 = 24.

self.pile_emb is a separate nn.Embedding(card_vocab_size, PILE_EMB_DIM) from self.card_emb. Same vocabulary index, independent weight tables. A Strike in the hand and a Strike in the draw pile look up different rows — one for "I can play this right now," one for "this is in the bag I'll draw from." Sharing the table would force a single embedding to carry both meanings; splitting it lets each table specialize.

The inner _pool_pile function is four lines: multiply the embedding by the mask, sum across the pile dim, divide by the clamped mask sum. The mask.sum(-1).clamp(min=1) is the empty-pile guard. Without clamp, an empty exhaust pile — common on turn one — divides zero by zero and NaNs the forward pass. With clamp(min=1), the denominator is 1 but the numerator is still zero (every embedding was masked out), so the pool vector comes out as all zeros. That's the right default for "nothing here."

Relics (10)

Relics → pooled summary

Game state

state["relics"] = [
  {"id": "BurningBlood", "counter": 0, "active": false},
  {"id": "Lantern",      "counter": 1, "active": true},
  ...
]

EncodedState fields

relic_ids[1, 20] long
relic_counters[1, 20] = min(counter, 10) / 10
relic_active[1, 20] → flag
relic_mask[1, 20]
Mean-pooled → [1, 10] = 8-dim emb + counter + active

Up to 20 relic slots mean-pooled into a single 10-dim summary: RELIC_EMB_DIM(8) + 1 counter + 1 active = 10. The embedding and the two scalars are concatenated per-slot, mask-zeroed, summed, and divided by the clamped mask count — the same clamp(min=1) guard as the piles.

The pool-instead-of-positional choice is intentional. Relics are mostly passive modifiers — Bronze Scales reflects damage, Lantern gives free energy turn 1, Ink Bottle triggers every ten cards played. The policy network doesn't need to know "relic in slot 3 vs. slot 4"; it needs "is this relic-y effect active?" Mean-pooling collapses positional information the network would have to learn to ignore. The pile pools (30 slots → 8 dims each, 1:30) and this relic pool (20 slots → 10 dims, 1:20) are the two most aggressive shrinks in the vector.

relic_counters = min(counter, 10) / 10 caps at 10. That works for the common counter-relics — Lantern, Snecko Eye, Ink Bottle charge counters. Rare late-game relics that accrue >10 stacks would clip to 1.0, losing discrimination. In practice this hasn't shown up as a training signal; flagged here for anyone building a similar encoder on a different relic pool.

Player powers (9)

Powers → pooled summary

Game state

player["powers"] = [
  {"id": "Ritual",     "amount": 3},
  {"id": "StrengthUp", "amount": 2},
  ...
]

EncodedState fields

player_power_ids[1, 15] long
player_power_amounts[1, 15] → clamped [-50, 50] / 50
player_power_mask[1, 15]
Mean-pooled → [1, 9] = 8-dim emb + 1 amount

Fifteen generic power slots mean-pooled to POWER_EMB_DIM(8) + 1 = 9. Smallest group in the concat.

self.power_emb is shared between the player-power pool and the per-enemy power pools from the Enemies section. One nn.Embedding(power_vocab_size, 8) table, two call sites. A Ritual power on the player and a Ritual power on an enemy lookup the same embedding row — shared semantics for shared mechanics, without doubling the embedding weights.

The amount scalar is clipped symmetrically: min(max(amount, -50), 50) / 50, in _encode_powers (line 167). Strength-like powers can be negative (strength debuff reduces attack damage); the symmetric clip preserves that sign. The /50 divisor is chosen the same way other debuff caps are chosen — realistic maxes from gameplay, not defensive caps from game source.

Where it all lands

18Player

270Hand

252Enemies

24Piles

10Relics

9Powers

Hand and enemies: 522 of 583 dims. The policy spends most of its input budget on what's in front of it right now.

Where this lands in the net

Six tensors, one concatenation, a [B, 583] block.

x = torch.cat([enc.scalars, card_flat, enemy_flat,
               draw_pool, disc_pool, exh_pool,
               relic_pool, pp_pool], dim=-1)
# x.shape == [B, 583]

That [B, 583] feeds Linear(583, 256) → LayerNorm → ReLU → Linear(256, 256) → LayerNorm → ReLU, then the heads: policy_head: Linear(256, 61) for action logits, value_head: Linear(256, 1) for V(s), discard_head: Linear(256, 10) for mid-turn discard priority, and choice_proj: Linear(256, 16) — the card-choice head, trained outside the PPO loss and covered in the Card-Choice Head deep dive. Illegal actions get masked to -1e9 before the softmax. 61 actions = 1 end_turn + 10 × 6 play(card, target).

The trunk is a plain MLP. No attention, no transformer. The 583 → 256 compression in the first linear layer is the only bottleneck between raw state and hidden, which is why the input budget matters: every dimension spent on padding or redundant encoding is headroom the policy and value heads don't get.

action_index_to_dict (line 353) inverts the flat action index back into {"type": "play", "card_idx": c, "target_idx": t} so the bridge can dispatch it to the C# game. The input dim budget and the action budget are engineered together: MAX_HAND × MAX_TARGETS + 1 = 61, and self.policy_head = Linear(256, 61). Change one constant and both the concat shape and the head width move with it.