Pokemon Red and the Evolution of FSM: Past and Present
I love Pokemon

Preface...
In the turbulent 1990s, countless boys around the world left Pallet Town under the constraint of just 8 KB of RAM.
And those boys now create entirely new worlds on top of tens of gigabytes of memory.
Z80 CPU - Work RAM (WRAM) of an 8-bit architecture
The 1 MB cartridge ROM was less than a thousandth of the size of a single high-resolution image by today's standards.
And the game that started there went on to astonish the world.
Survival Strategy: Everything Done by Hand

(Pokémon Red, shared as a disassembly)
In the Game Boy era, developers were not simply "making the code run";
they were trying to squeeze an entire game into a 64 KB address space by any means necessary.
In an age without automation or compiler optimizations, everything was done by hand, and each line of assembly code must have been the result of carefully weighing its purpose and its cost.
Now, before we dive into the optimizations of that era, we first need to understand what a register is.

(Z80 register hardware pin diagram)
What Is a Register?
A register is an ultra-fast data storage unit located inside the CPU,
a small, fast memory element used for arithmetic, control flow, and memory access.
We will describe registers through the lens of the Z80 architecture.
Why Are Registers Important?
Type | Access Speed | Cycle Count | Role |
Register | Very fast (inside CPU) | 1~4 cycles | Immediately usable, optimized for computation |
RAM | Slow (goes through the bus) | 8~16 cycles | Auxiliary role as a slow storage medium |
Even from this table alone, it is clear that registers are significantly faster.
But why compare registers and RAM in the first place?
Let's first look at the difference between RAM and registers.
A register is the workbench laid out right in front of the programmer.
The workbench is very cramped, and in the Z80 you could only place 7 to 8 tools on it, but once something was on the workbench, the programmer could pick up any tool and use it immediately.
That is precisely why it takes so few cycles.
RAM, on the other hand, is a warehouse that the programmer has to leave the workshop to reach.
The warehouse is large and can store a great deal of material (data), but every time a new piece of material is needed, the programmer has to stop what they are doing and go fetch it,
which is why it takes so many cycles.
So what did "optimization" mean in Pokémon Red?
It was a desperate fight to minimize the number of "trips to the warehouse."
To render a smooth screen without dropped frames on a sluggish Z80 CPU, you had to stay at the workbench and cycle through just 8 tools to get everything done without ever going to the warehouse,
and in doing so give dreams to a child who would lose interest in an instant.
We will now take a look at what those craftsman's tools actually looked like.

(The craftsman's tools: registers of the Z80 architecture)
Description of Z80 Registers
A register is not simply an 8-bit box.
It was a precision instrument whose internal bits were designed to carry different meanings depending on their position,
and in particular, the Z80's A and F register pair (AF) contained bits called Flags that tracked the results of arithmetic operations.
Bit Position | Flag Name | Meaning |
|---|---|---|
7 | S (Sign) | Is the result negative? |
6 | Z (Zero) | Is the result zero? |
4 | H (Half-Carry) | Whether a carry occurred in BCD arithmetic |
2 | P/V (Parity/Overflow) | Even parity or overflow |
1 | N (Add/Sub) | Was the last operation a subtraction? |
0 | C (Carry) | Whether a carry or borrow occurred |
These were the core mechanism for performing conditional branching on a Z80 that had no if statements.
The Game Boy's CPU (SM83) was designed based on the Z80 but is a derivative with some features removed for cost and power efficiency. The flag register was also reduced from the Z80's 6 flags down to 4.
Bit Position | Flag Name | Meaning |
|---|---|---|
7 | Z (Zero) | Is the result zero? |
6 | N (Subtract) | Was the last operation a subtraction? |
5 | H (Half Carry) | Whether a carry occurred from the lower 4 bits |
4 | C (Carry) | Whether a carry or borrow occurred |
S (Sign) and P/V (Parity/Overflow) were removed. Conditional branching in Pokémon Red is implemented using only these 4 flags.
The constraint of 8 registers did not change. Instead, to squeeze every last drop out of those 8, the SM83 allowed two registers to be paired and operated as a 16-bit pair. Rather than adding more tools, it used the same tools in two different ways.
Pair | Composition | Primary Use |
|---|---|---|
BC | B + C | Counter, general-purpose data transfer |
DE | D + E | Source/destination address during memory copy |
HL | H + L | Memory pointer, address traversal (the most critical) |
The HL pair in particular was used as a pointer register that directly pointed to memory addresses. Instructions like LD A, [HL] could immediately load the value at the address HL pointed to into A, and [HLI] and [HLD] instructions allowed the address to be automatically incremented or decremented after the load.
In other words, a programmer could use the 8 registers as individual tools while also combining two of them into an address-traversal ruler whenever needed.
Traversing a tile map or copying sprite data would have been nearly impossible without this HL pair.
It was a matter of pushing the 8 tools on the workbench to their absolute limits.
So how were these tools actually put to use?

(Viridian City)
This is the Viridian City screen and the event scripts used in Viridian City.
Their World Was Composed of States
The limitations of the Z80 did not stop at raw computational power.
"Programmers could not keep track of every event by themselves."
Instead, a structure called a FSM (Finite State Machine) was introduced.
What Is an FSM?
An FSM is a system in which behavior changes depending on the current state.
In other words, it is a structure where "what can be done next" is determined by "what is currently happening."
The Simplest Everyday FSM Example: A Traffic Light
Current State | Condition | Next State |
|---|---|---|
Red | 30 seconds elapsed | Green |
Green | 30 seconds elapsed | Yellow |
Yellow | 5 seconds elapsed | Red |
Current State: Red, Green, Yellow
Transition Condition: A fixed amount of time elapses
Action: Transition to the next state
A traffic light is an FSM.
This is because the next action is determined by the current color (state).
In the case of a traffic light, we can say it is an FSM where each State (the current color) carries an Action (changing to the next color).
An FSM is a design approach that separates behavior around the concept of "state",
where each state represents a single behavior, and the system transitions to the next state based on an input or condition.
And this is the key to understanding Pokémon.
Why FSMs Are Necessary: "Everything in a Game" Is a State
If you dig into Pokémon Red, everything is an FSM.
Among them, let's use Viridian City as an example.

(Viridian City assembly view)
Before looking at the FSM itself, how does Pokémon move between states?
The FSM Engine: Script Pointer Table and Dispatcher
ViridianCity_Script:
call EnableAutoTextBoxDrawing
ld hl, ViridianCity_ScriptPointers ; 1. Load the start address of the script pointer table into HL
ld a, [wViridianCityCurScript] ; 2. Load the current script index into register A
jp CallFunctionInTable ; 3. Jump to the function at index A in the pointer table
ViridianCity_ScriptPointers:
def_script_pointers
dw_const ViridianCityDefaultScript, SCRIPT_VIRIDIANCITY_DEFAULT ; Index 0
dw_const ViridianCityOldManStartCatchTrainingScript, SCRIPT_VIRIDIANCITY_OLD_MAN_START_CATCH_TRAINING ; Index 1
; ... etc.
This is the heart and engine of Viridian City's event FSM.
The state storage variable (The State): wViridianCityCurScript is a 1-byte value stored at a specific location in RAM, and it determines the FSM's current state.
The dispatcher (The Dispatcher), ViridianCity_Script, is called every game tick and checks the value of wViridianCityCurScript.
Then the jump table CallFunctionInTable uses the value of wViridianCityCurScript as an index to look up the address of the actual script (state) to execute in the ViridianCity_ScriptPointers table, and jumps (JP) to that address.
This is the most hardware-efficient implementation of a switch statement possible. Without a long chain of if-else comparisons, execution is transferred to the exact logic block in just a few operations.
Implementing States and Actions
Labels like ViridianCityDefaultScript that the table points to are the actual code corresponding to each "State" of the FSM,
and the instructions inside them are the "Actions."
In the actual in-game code, it is implemented like this.
ViridianCityCheckGymOpenScript:
CheckEvent EVENT_VIRIDIAN_GYM_OPEN ; Action 1: Check if the gym has been opened
ret nz ; If the condition is met (NZ), return immediately
; ... (omitted) ...
ld a, TEXT_VIRIDIANCITY_GYM_LOCKED ; Action 2: Load the "Door Locked" text ID into A
ldh [hTextID], a
call DisplayTextID ; Action 3: Display the text
Here, call instructions such as CheckEvent and DisplayTextID are the Actions performed in that state.
So what exactly is call?
Basic Description: How call Works Step by Step
call SomeLabel
When call is executed on the Z80, the following happens:
1. The current PC (Program Counter + 3) value is pushed onto the stack
->That is, the address of the next instruction is saved.
2. PC is set to the address of SomeLabel
->Execution immediately jumps there.
3. When a ret instruction is encountered:
->PC is popped from the stack
->Execution returns to the original location.
The reason call matters is that
even at this low level,
it modularizes actions independently:
call display
call Delay3
and so on, where each individual action is implemented as a modularized subroutine.
In other words, call is like making a phone call:
when you call a function with call, the Z80 jots down the address to return to after the call on the stack, and ret is the act of hanging up and going back.
Now that we understand how states and actions are implemented, how does the most important part of an FSM, the "state transition," actually occur?
It happens simply by writing a new value into the wViridianCityCurScript variable.
Let's look at a script example.
Transition Triggered by Player Action (ViridianCityOldManText) Example:
ViridianCityOldManText:
; ... (dialog output) ...
call YesNoChoice ; Show a Yes/No choice prompt
ld a, [wCurrentMenuItem] ; Load the player's selection (0 = No, 1 = Yes) into A
and a ; Check if A is zero (i.e., "No")
jr z, .refused ; If zero ("No"), jump to .refused
ld hl, .KnowHowToCatchPokemonText ; Case: player selected "Yes"
call PrintText
; === State transition occurs here ===
ld a, SCRIPT_VIRIDIANCITY_OLD_MAN_START_CATCH_TRAINING ; Load the new state ID (index 1) into A
ld [wViridianCityCurScript], a ; Overwrite the current state variable
jr .done
; ...
The moment the player selects "Yes," the value of wViridianCityCurScript changes from SCRIPT_VIRIDIANCITY_DEFAULT (0) to SCRIPT_VIRIDIANCITY_OLD_MAN_START_CATCH_TRAINING (1) within the script pointer table.
With that, the state transition is complete,
and on the next game tick, the dispatcher will now execute ViridianCityOldManStartCatchTrainingScript.
A brief summary of Pokémon Red's route-based state machine is as follows:
MainGameLoop:
LD A, (wGameMode)
CP $00
JP Z, InitBattle
CP $01
JP Z, HandleOverworld
CP $02
JP Z, HandleMenu
JP MainGameLoop
Pokémon Red's FSM has the following characteristics:
Every state is a 1-byte value (similar to an enum)
Branching is done using a CP + JP Z, label combination
Direct jump structure rather than stack-based function calls
Subroutines are handled with JP instead of CALL wherever possible, minimizing stack usage
Why Was It Designed This Way?
The Z80's stack consumes RAM space, its size is determined directly by the programmer based on the initial value of the SP register, and the risk of stack overflow was ever-present.
The CALL/RET structure is slow and carries the risk of errors,
which is why most FSMs are handled with flat branching using CP + JP.
FSM vs. Modern FSM: How Has the Structure Changed?

(Unity's animation FSM, one representative of modern game engines)
The FSM on the Z80 was, quite literally, the machine itself.
State was a value stored in a register
Transition was a jump based on condition flags
Action was directly wired via call and jp
Stack, functions, and abstraction were minimized
But today's FSMs are structurally completely different.
1. State Is No Longer a Register Value
Pokémon example
ld a, [wViridianCityCurScript]
jp CallFunctionInTable
FSM implemented with OOP
switch (currentState)
{
case GameState.TitleScreen:
ShowTitleUI();
break;
case GameState.Battle:
StartBattle();
break;
}
State is an explicit Enum
Transitions are meaning-based conditionals
No need to worry about return addresses
2. Transitions Are Controlled by Semantic Conditions, Not Condition Flags
Pokémon example
cp 1
jp z, SomeState
FSM implemented with OOP
if (player.HasPokedex && !gym.IsOpen)
currentState = GameState.TriggerEvent;
class BattleState : IState
{
public void Enter()
{
UI.Show("A wild Pokémon appeared!");
}
public void Update()
{
if (Input.GetKeyDown(KeyCode.A))
{
stateMachine.ChangeState(new FightState());
}
}
}
It became human-friendly and easy to maintain,
and Actions were also modularized into encapsulated functions, with behavior divided into classes or script units.
(FSM code is generally implemented using a switch statement, but in C# you can also use a mapping structure like Dictionary<State, Action> to register behaviors. A switch statement offers very fast branching performance when case values are dense and consecutive constants like enums, because the compiler can optimize it into a jump table. A Dictionary approach, on the other hand, is advantageous when state values are sparse or when states need to be dynamically injected or extended at runtime. In particular, when there are many states and their values are not uniform, a Dictionary can be more concise, easier to maintain, and competitive in runtime performance.)
The most important difference: the flow of control
Category | Z80 FSM | Modern OOP FSM |
|---|---|---|
State Transition | Direct state value change ( | Switching between state objects ( |
Flow Control |
| Frame-based |
State Persistence | RAM variable ( | Class instances, context objects |
Dispatch Method | Pointer table + JP | Virtual function calls / interface-based dispatch |
Of course each approach had its own advantages, but the OOP approach was generally far more scalable, and low-level manipulation became increasingly difficult as games grew ever larger in scope,
which is why OOP-style FSMs gradually became the norm.
OOP FSMs clearly offered the following advantages:
Explicit state classes
Modularized Enter, Update, and Exit functions
Good readability and ease of maintenance, among other benefits
Yet despite all this, performance bottlenecks still existed.
What does it mean to transition a state? It means an object is being recreated. Object pooling and similar patterns can help, but they too have their limits,
and the following performance bottlenecks were present.
Performance Bottlenecks of OOP FSMs
Limitation | Description |
|---|---|
Object recreation cost during state transitions | Can trigger GC or GPU cache invalidation |
Scattered calls to individual Update functions | Unfavorable for CPU Branch Prediction |
State data spread across multiple objects | Loss of data locality, increased cache misses |
Difficulty handling mobile or large-scale entity counts | Inefficient parallelization |
In short, OOP FSMs have a beautiful structure, but their performance limitations in large-scale environments are clear.
And this is where DOP enters the picture.
What is DOP? It stands for Data-Oriented Programming.

The Core of DOP FSM
State is no longer an "object."
State is "component data,"
and behavior is now the responsibility of a "System."
DOP is a programming orientation that returns to a data-centric approach in order to overcome the performance bottlenecks that grow as OOP systems scale,
and in Unity it is implemented under the name ECS.
If we structure a DOP FSM in C based on Unity ECS FSM patterns, it looks like the following.
1. States Are Arranged in Structs (Arrays)
typedef enum
{
STATE_IDLE,
STATE_MOVE,
STATE_ATTACK
} UnitState;
typedef struct
{
float x, y;
UnitState state;
int targetId;
} Unit;
State is no longer a field on an individual unit; it becomes a single column in an array of all units.
2. Branch by linearly filtering data per state
void processFSM(Unit* units, int count)
{
for (int i = 0; i < count; ++i)
{
switch (units[i].state)
{
case STATE_IDLE:
// do nothing
break;
case STATE_MOVE:
moveUnit(&units[i]);
break;
case STATE_ATTACK:
attackTarget(&units[i]);
break;
}
}
}
(The preceding code rewritten in a DOP style. This is intended for direct comparison with the earlier form and is not the fully correct DOP approach; the proper DOP approach is shown below.)
2-1. The Proper DOP Approach
// True DOP approach: no state branching (if/switch) whatsoever
void processMoveSystem(MoveComponent* movers, int count) {
// The CPU processes sequentially through an array containing only the entities that are actually moving (state branch misprediction rate: 0%)
for (int i = 0; i < count; ++i) {
movers[i].x += 1;
}
}
There is actually still one remaining problem in the switch-based processFSM shown earlier:
traversing the array sequentially solves cache locality, but a switch statement is encountered on every iteration of the loop. If the array's states are mixed at random, like [IDLE, ATTACK, MOVE, IDLE, ATTACK...], the modern CPU's branch predictor cannot predict which case will execute next and must continuously flush the pipeline, resulting in enormous cycle waste.
True DOP eliminates the switch statement itself from the hot path. Entities in the same state are gathered into physically separate arrays, and each system only iterates over the array it is responsible for.
3. Implement State Transitions as Write Operations.
void moveUnit(Unit* u) {
u->x += 1;
if (distanceToTarget(u) < 2.0f)
{
u->state = STATE_ATTACK; // State Transition
}
}
This means that rather than the object-to-object philosophy that changed under OOP,
things have begun to resemble the old FSMs that mapped directly onto the machine.
Feature | Z80 FSM (Past) | OOP FSM (the beautiful detour) | DOP FSM (Return to the Past) |
|---|---|---|---|
State Representation | Raw data in RAM ( | Object (pointer/reference) | Raw data in memory ( |
Logic Execution | Sequential loop + jump ( | Distributed individual calls ( | Sequential loop + condition ( |
Memory Access | Sequential access ( | Random access (pointer chasing) | Sequential access (array traversal) |
Core Value | Hardware efficiency | Human readability | Hardware efficiency |
Three Stages of FSM Evolution
Era | Implementation | State Representation | Control Flow | Parallelism / Performance |
|---|---|---|---|---|
Z80 Era | Registers + Jumps | RAM variables | JP/RET | Hardware-optimal |
OOP Era | State classes | Object fields | ChangeState() | Maintenance-optimal |
Modern Real-Time Systems | DOP-based | Struct columns | Data conditions | Performance-optimal |

The cheapest, fastest, and most reliable components are those that aren't there.
— Gordon Bell
8 KB of RAM, No Floating-Point Arithmetic
For the Pokémon developers, the "parts that didn't exist" were a floating-point arithmetic unit, generous amounts of RAM, and a smart compiler.
They accepted the absence of those things as a given and shaped their world using only what they had: registers and a small amount of memory.
A famous computer scientist once said:
The cheapest, fastest, and most reliable component is the one that isn't there.
Bell said that the part that doesn't exist is the best part of all, but the developers of Pokémon answered: "we turn the missing parts into code."
For us today, the "parts that don't exist" may well be infinite CPU cache and perfect branch prediction.
In the midst of abundance we assume that hardware will naturally be fast,
yet the highest performance still emerges when we understand how the machine operates and organize our data according to its principles.
The evolution of FSMs is like a great pendulum swinging between "abstraction for humans" and "optimization for machines."
And in the shape of DOP today, we are witnessing the new challenge of modern developers who seek to claim both at once.
The tools have changed, but the "craftsman's spirit" of striving to create the best possible experience within given constraints remains unchanged, then as now.
And then as now, the cheapest, fastest, and most reliable component
is the programmer, still very much alive.
Pokémon assembly source code: https://github.com/pret/pokered