Skip to content

Pokemon Red and the Evolution of FSM: Past and Present

I love Pokemon

Makonea·Apr 21, 2026·39 min

Preface...

In the turbulent 1990s, countless boys around the world left Pallet Town under the constraint of just 8 KB of RAM.

And those boys now create entirely new worlds on top of tens of gigabytes of memory.

Z80 CPU - Work RAM (WRAM) of an 8-bit architecture

The 1 MB cartridge ROM was less than a thousandth of the size of a single high-resolution image by today's standards.

And the game that started there went on to astonish the world.

Survival Strategy: Everything Done by Hand

(Pokémon Red, shared as a disassembly)

In the Game Boy era, developers were not simply "making the code run";

they were trying to squeeze an entire game into a 64 KB address space by any means necessary.

In an age without automation or compiler optimizations, everything was done by hand, and each line of assembly code must have been the result of carefully weighing its purpose and its cost.

Now, before we dive into the optimizations of that era, we first need to understand what a register is.

(Z80 register hardware pin diagram)

What Is a Register?

A register is an ultra-fast data storage unit located inside the CPU,

a small, fast memory element used for arithmetic, control flow, and memory access.

We will describe registers through the lens of the Z80 architecture.

Why Are Registers Important?

Type

Access Speed

Cycle Count

Role

Register

Very fast (inside CPU)

1~4 cycles

Immediately usable, optimized for computation

RAM

Slow (goes through the bus)

8~16 cycles

Auxiliary role as a slow storage medium

Even from this table alone, it is clear that registers are significantly faster.

But why compare registers and RAM in the first place?

Let's first look at the difference between RAM and registers.

A register is the workbench laid out right in front of the programmer.

The workbench is very cramped, and in the Z80 you could only place 7 to 8 tools on it, but once something was on the workbench, the programmer could pick up any tool and use it immediately.

That is precisely why it takes so few cycles.

RAM, on the other hand, is a warehouse that the programmer has to leave the workshop to reach.

The warehouse is large and can store a great deal of material (data), but every time a new piece of material is needed, the programmer has to stop what they are doing and go fetch it,

which is why it takes so many cycles.

So what did "optimization" mean in Pokémon Red?

It was a desperate fight to minimize the number of "trips to the warehouse."

To render a smooth screen without dropped frames on a sluggish Z80 CPU, you had to stay at the workbench and cycle through just 8 tools to get everything done without ever going to the warehouse,

and in doing so give dreams to a child who would lose interest in an instant.

We will now take a look at what those craftsman's tools actually looked like.

(The craftsman's tools: registers of the Z80 architecture)

Description of Z80 Registers

A register is not simply an 8-bit box.

It was a precision instrument whose internal bits were designed to carry different meanings depending on their position,

and in particular, the Z80's A and F register pair (AF) contained bits called Flags that tracked the results of arithmetic operations.

Bit Position

Flag Name

Meaning

7

S (Sign)

Is the result negative?

6

Z (Zero)

Is the result zero?

4

H (Half-Carry)

Whether a carry occurred in BCD arithmetic

2

P/V (Parity/Overflow)

Even parity or overflow

1

N (Add/Sub)

Was the last operation a subtraction?

0

C (Carry)

Whether a carry or borrow occurred

These were the core mechanism for performing conditional branching on a Z80 that had no if statements.

The Game Boy's CPU (SM83) was designed based on the Z80 but is a derivative with some features removed for cost and power efficiency. The flag register was also reduced from the Z80's 6 flags down to 4.

Bit Position

Flag Name

Meaning

7

Z (Zero)

Is the result zero?

6

N (Subtract)

Was the last operation a subtraction?

5

H (Half Carry)

Whether a carry occurred from the lower 4 bits

4

C (Carry)

Whether a carry or borrow occurred

S (Sign) and P/V (Parity/Overflow) were removed. Conditional branching in Pokémon Red is implemented using only these 4 flags.

The constraint of 8 registers did not change. Instead, to squeeze every last drop out of those 8, the SM83 allowed two registers to be paired and operated as a 16-bit pair. Rather than adding more tools, it used the same tools in two different ways.

Pair

Composition

Primary Use

BC

B + C

Counter, general-purpose data transfer

DE

D + E

Source/destination address during memory copy

HL

H + L

Memory pointer, address traversal (the most critical)

The HL pair in particular was used as a pointer register that directly pointed to memory addresses. Instructions like LD A, [HL] could immediately load the value at the address HL pointed to into A, and [HLI] and [HLD] instructions allowed the address to be automatically incremented or decremented after the load.

In other words, a programmer could use the 8 registers as individual tools while also combining two of them into an address-traversal ruler whenever needed.

Traversing a tile map or copying sprite data would have been nearly impossible without this HL pair.

It was a matter of pushing the 8 tools on the workbench to their absolute limits.

So how were these tools actually put to use?

(Viridian City)

This is the Viridian City screen and the event scripts used in Viridian City.

Their World Was Composed of States

The limitations of the Z80 did not stop at raw computational power.

"Programmers could not keep track of every event by themselves."

Instead, a structure called a FSM (Finite State Machine) was introduced.

What Is an FSM?

An FSM is a system in which behavior changes depending on the current state.

In other words, it is a structure where "what can be done next" is determined by "what is currently happening."

The Simplest Everyday FSM Example: A Traffic Light

Current State

Condition

Next State

Red

30 seconds elapsed

Green

Green

30 seconds elapsed

Yellow

Yellow

5 seconds elapsed

Red

Current State: Red, Green, Yellow

Transition Condition: A fixed amount of time elapses

Action: Transition to the next state

A traffic light is an FSM.

This is because the next action is determined by the current color (state).

In the case of a traffic light, we can say it is an FSM where each State (the current color) carries an Action (changing to the next color).

An FSM is a design approach that separates behavior around the concept of "state",

where each state represents a single behavior, and the system transitions to the next state based on an input or condition.

And this is the key to understanding Pokémon.

Why FSMs Are Necessary: "Everything in a Game" Is a State

If you dig into Pokémon Red, everything is an FSM.

Among them, let's use Viridian City as an example.

(Viridian City assembly view)

Before looking at the FSM itself, how does Pokémon move between states?

The FSM Engine: Script Pointer Table and Dispatcher

ViridianCity_Script:
	call EnableAutoTextBoxDrawing
	ld hl, ViridianCity_ScriptPointers    ; 1. Load the start address of the script pointer table into HL
	ld a, [wViridianCityCurScript]        ; 2. Load the current script index into register A
	jp CallFunctionInTable                ; 3. Jump to the function at index A in the pointer table


ViridianCity_ScriptPointers:
	def_script_pointers
	dw_const ViridianCityDefaultScript,               SCRIPT_VIRIDIANCITY_DEFAULT ; Index 0
	dw_const ViridianCityOldManStartCatchTrainingScript, SCRIPT_VIRIDIANCITY_OLD_MAN_START_CATCH_TRAINING ; Index 1
	; ... etc.

This is the heart and engine of Viridian City's event FSM.

The state storage variable (The State): wViridianCityCurScript is a 1-byte value stored at a specific location in RAM, and it determines the FSM's current state.

The dispatcher (The Dispatcher), ViridianCity_Script, is called every game tick and checks the value of wViridianCityCurScript.

Then the jump table CallFunctionInTable uses the value of wViridianCityCurScript as an index to look up the address of the actual script (state) to execute in the ViridianCity_ScriptPointers table, and jumps (JP) to that address.

This is the most hardware-efficient implementation of a switch statement possible. Without a long chain of if-else comparisons, execution is transferred to the exact logic block in just a few operations.

Implementing States and Actions

Labels like ViridianCityDefaultScript that the table points to are the actual code corresponding to each "State" of the FSM,

and the instructions inside them are the "Actions."

In the actual in-game code, it is implemented like this.

ViridianCityCheckGymOpenScript:
	CheckEvent EVENT_VIRIDIAN_GYM_OPEN ; Action 1: Check if the gym has been opened
	ret nz                             ; If the condition is met (NZ), return immediately
	; ... (omitted) ...
	ld a, TEXT_VIRIDIANCITY_GYM_LOCKED ; Action 2: Load the "Door Locked" text ID into A
	ldh [hTextID], a
	call DisplayTextID                 ; Action 3: Display the text

Here, call instructions such as CheckEvent and DisplayTextID are the Actions performed in that state.

So what exactly is call?

Basic Description: How call Works Step by Step

call SomeLabel

When call is executed on the Z80, the following happens:

1. The current PC (Program Counter + 3) value is pushed onto the stack

->That is, the address of the next instruction is saved.

2. PC is set to the address of SomeLabel

->Execution immediately jumps there.

3. When a ret instruction is encountered:

->PC is popped from the stack

->Execution returns to the original location.

The reason call matters is that

even at this low level,

it modularizes actions independently:

call display

call Delay3

and so on, where each individual action is implemented as a modularized subroutine.

In other words, call is like making a phone call:

when you call a function with call, the Z80 jots down the address to return to after the call on the stack, and ret is the act of hanging up and going back.

Now that we understand how states and actions are implemented, how does the most important part of an FSM, the "state transition," actually occur?

It happens simply by writing a new value into the wViridianCityCurScript variable.

Let's look at a script example.

Transition Triggered by Player Action (ViridianCityOldManText) Example:

ViridianCityOldManText:
	; ... (dialog output) ...
	call YesNoChoice             ; Show a Yes/No choice prompt
	ld a, [wCurrentMenuItem]     ; Load the player's selection (0 = No, 1 = Yes) into A
	and a                        ; Check if A is zero (i.e., "No")
	jr z, .refused               ; If zero ("No"), jump to .refused
	ld hl, .KnowHowToCatchPokemonText ; Case: player selected "Yes"
	call PrintText

    ; === State transition occurs here ===
	ld a, SCRIPT_VIRIDIANCITY_OLD_MAN_START_CATCH_TRAINING ; Load the new state ID (index 1) into A
	ld [wViridianCityCurScript], a                         ; Overwrite the current state variable
	jr .done
; ...

The moment the player selects "Yes," the value of wViridianCityCurScript changes from SCRIPT_VIRIDIANCITY_DEFAULT (0) to SCRIPT_VIRIDIANCITY_OLD_MAN_START_CATCH_TRAINING (1) within the script pointer table.

With that, the state transition is complete,

and on the next game tick, the dispatcher will now execute ViridianCityOldManStartCatchTrainingScript.

A brief summary of Pokémon Red's route-based state machine is as follows:

MainGameLoop:
    LD A, (wGameMode)
    CP $00
    JP Z, InitBattle

    CP $01
    JP Z, HandleOverworld

    CP $02
    JP Z, HandleMenu

    JP MainGameLoop

Pokémon Red's FSM has the following characteristics:

Every state is a 1-byte value (similar to an enum)

Branching is done using a CP + JP Z, label combination

Direct jump structure rather than stack-based function calls

Subroutines are handled with JP instead of CALL wherever possible, minimizing stack usage

Why Was It Designed This Way?

The Z80's stack consumes RAM space, its size is determined directly by the programmer based on the initial value of the SP register, and the risk of stack overflow was ever-present.

The CALL/RET structure is slow and carries the risk of errors,

which is why most FSMs are handled with flat branching using CP + JP.

FSM vs. Modern FSM: How Has the Structure Changed?

(Unity's animation FSM, one representative of modern game engines)

The FSM on the Z80 was, quite literally, the machine itself.

State was a value stored in a register

Transition was a jump based on condition flags

Action was directly wired via call and jp

Stack, functions, and abstraction were minimized

But today's FSMs are structurally completely different.

1. State Is No Longer a Register Value

Pokémon example

ld a, [wViridianCityCurScript]
jp CallFunctionInTable

FSM implemented with OOP

switch (currentState)
{
    case GameState.TitleScreen:
        ShowTitleUI();
        break;
    case GameState.Battle:
        StartBattle();
        break;
}

State is an explicit Enum

Transitions are meaning-based conditionals

No need to worry about return addresses

2. Transitions Are Controlled by Semantic Conditions, Not Condition Flags

Pokémon example

cp 1

jp z, SomeState

FSM implemented with OOP

if (player.HasPokedex && !gym.IsOpen)
    currentState = GameState.TriggerEvent;
class BattleState : IState 
{
    public void Enter() 
    {
        UI.Show("A wild Pokémon appeared!");
    }
    public void Update() 
    {
        if (Input.GetKeyDown(KeyCode.A)) 
        {
            stateMachine.ChangeState(new FightState());
        }
    }
}

It became human-friendly and easy to maintain,

and Actions were also modularized into encapsulated functions, with behavior divided into classes or script units.

(FSM code is generally implemented using a switch statement, but in C# you can also use a mapping structure like Dictionary<State, Action> to register behaviors. A switch statement offers very fast branching performance when case values are dense and consecutive constants like enums, because the compiler can optimize it into a jump table. A Dictionary approach, on the other hand, is advantageous when state values are sparse or when states need to be dynamically injected or extended at runtime. In particular, when there are many states and their values are not uniform, a Dictionary can be more concise, easier to maintain, and competitive in runtime performance.)

The most important difference: the flow of control

Category

Z80 FSM

Modern OOP FSM

State Transition

Direct state value change (ld [state], a)

Switching between state objects (stateMachine.ChangeState())

Flow Control

jp, call, ret (direct PC manipulation)

Frame-based Update(), Coroutine, async/await

State Persistence

RAM variable (wCurScript)

Class instances, context objects

Dispatch Method

Pointer table + JP

Virtual function calls / interface-based dispatch

Of course each approach had its own advantages, but the OOP approach was generally far more scalable, and low-level manipulation became increasingly difficult as games grew ever larger in scope,

which is why OOP-style FSMs gradually became the norm.

OOP FSMs clearly offered the following advantages:

Explicit state classes

Modularized Enter, Update, and Exit functions

Good readability and ease of maintenance, among other benefits

Yet despite all this, performance bottlenecks still existed.

What does it mean to transition a state? It means an object is being recreated. Object pooling and similar patterns can help, but they too have their limits,

and the following performance bottlenecks were present.

Performance Bottlenecks of OOP FSMs

Limitation

Description

Object recreation cost during state transitions

Can trigger GC or GPU cache invalidation

Scattered calls to individual Update functions

Unfavorable for CPU Branch Prediction

State data spread across multiple objects

Loss of data locality, increased cache misses

Difficulty handling mobile or large-scale entity counts

Inefficient parallelization

In short, OOP FSMs have a beautiful structure, but their performance limitations in large-scale environments are clear.

And this is where DOP enters the picture.

What is DOP? It stands for Data-Oriented Programming.

The Core of DOP FSM

State is no longer an "object."

State is "component data,"

and behavior is now the responsibility of a "System."

DOP is a programming orientation that returns to a data-centric approach in order to overcome the performance bottlenecks that grow as OOP systems scale,

and in Unity it is implemented under the name ECS.

If we structure a DOP FSM in C based on Unity ECS FSM patterns, it looks like the following.

1. States Are Arranged in Structs (Arrays)

typedef enum 
{
    STATE_IDLE,
    STATE_MOVE,
    STATE_ATTACK
} UnitState;

typedef struct 
{
    float x, y;
    UnitState state;
    int targetId;
} Unit;

State is no longer a field on an individual unit; it becomes a single column in an array of all units.

2. Branch by linearly filtering data per state

void processFSM(Unit* units, int count) 
{
    for (int i = 0; i < count; ++i) 
    {
        switch (units[i].state) 
        {
            case STATE_IDLE:
                // do nothing
                break;
            case STATE_MOVE:
                moveUnit(&units[i]);
                break;
            case STATE_ATTACK:
                attackTarget(&units[i]);
                break;
        }
    }
}

(The preceding code rewritten in a DOP style. This is intended for direct comparison with the earlier form and is not the fully correct DOP approach; the proper DOP approach is shown below.)

2-1. The Proper DOP Approach

// True DOP approach: no state branching (if/switch) whatsoever
void processMoveSystem(MoveComponent* movers, int count) {
    // The CPU processes sequentially through an array containing only the entities that are actually moving (state branch misprediction rate: 0%)
    for (int i = 0; i < count; ++i) {
        movers[i].x += 1;
    }
}

There is actually still one remaining problem in the switch-based processFSM shown earlier:

traversing the array sequentially solves cache locality, but a switch statement is encountered on every iteration of the loop. If the array's states are mixed at random, like [IDLE, ATTACK, MOVE, IDLE, ATTACK...], the modern CPU's branch predictor cannot predict which case will execute next and must continuously flush the pipeline, resulting in enormous cycle waste.

True DOP eliminates the switch statement itself from the hot path. Entities in the same state are gathered into physically separate arrays, and each system only iterates over the array it is responsible for.

3. Implement State Transitions as Write Operations.

void moveUnit(Unit* u) {
    u->x += 1;
    if (distanceToTarget(u) < 2.0f) 
    {
        u->state = STATE_ATTACK;  // State Transition
    }
}

This means that rather than the object-to-object philosophy that changed under OOP,

things have begun to resemble the old FSMs that mapped directly onto the machine.

Feature

Z80 FSM (Past)

OOP FSM (the beautiful detour)

DOP FSM (Return to the Past)

State Representation

Raw data in RAM (byte)

Object (pointer/reference)

Raw data in memory (struct)

Logic Execution

Sequential loop + jump (JP)

Distributed individual calls (virtual Update)

Sequential loop + condition (System)

Memory Access

Sequential access ([hli])

Random access (pointer chasing)

Sequential access (array traversal)

Core Value

Hardware efficiency

Human readability

Hardware efficiency

Three Stages of FSM Evolution

Era

Implementation

State Representation

Control Flow

Parallelism / Performance

Z80 Era

Registers + Jumps

RAM variables

JP/RET

Hardware-optimal

OOP Era

State classes

Object fields

ChangeState()

Maintenance-optimal

Modern Real-Time Systems

DOP-based

Struct columns

Data conditions

Performance-optimal

Code Playground
Loading interactive code...

The cheapest, fastest, and most reliable components are those that aren't there.

— Gordon Bell

8 KB of RAM, No Floating-Point Arithmetic

For the Pokémon developers, the "parts that didn't exist" were a floating-point arithmetic unit, generous amounts of RAM, and a smart compiler.

They accepted the absence of those things as a given and shaped their world using only what they had: registers and a small amount of memory.

A famous computer scientist once said:

The cheapest, fastest, and most reliable component is the one that isn't there.

Bell said that the part that doesn't exist is the best part of all, but the developers of Pokémon answered: "we turn the missing parts into code."

For us today, the "parts that don't exist" may well be infinite CPU cache and perfect branch prediction.

In the midst of abundance we assume that hardware will naturally be fast,

yet the highest performance still emerges when we understand how the machine operates and organize our data according to its principles.

The evolution of FSMs is like a great pendulum swinging between "abstraction for humans" and "optimization for machines."

And in the shape of DOP today, we are witnessing the new challenge of modern developers who seek to claim both at once.

The tools have changed, but the "craftsman's spirit" of striving to create the best possible experience within given constraints remains unchanged, then as now.

And then as now, the cheapest, fastest, and most reliable component

is the programmer, still very much alive.

Pokémon assembly source code: https://github.com/pret/pokered