Historical revision

Data-Oriented Programming — 2026-05-19 14:40 UTC

Edit summary: AI translation (ko -> en-US)

rev_d18fb08c12804deca0390b52175d4b5e

Data-Oriented Programming

Data-Oriented Programming (DOP) is a way of thinking about programs that prioritizes the shape, flow, transformation, and access patterns of data over class and object hierarchies.

However, this term is used in two distinct ways.

TXT

1. Data-Oriented Design (DOD)
   Data-Oriented Design in the Game/Engine/Performance Optimization Camp
   Core concepts: CPU cache, memory layout, batch processing, Entity Component System (ECS), hot/cold split

2. Data-Oriented Programming (DOP)
   Immutable-data-centric programming as organized by Yehonathan Sharvit
   Core concepts: separation of code and data, Generic Data Structures, Immutable Data, pure transformation

The two share common ground, though they differ in emphasis.

TXT

A program = a transformer that converts data from one form to another

If object-oriented programming asks "which objects exchange messages with which," data-oriented programming starts by asking something different.

TXT

What data exists?
How much of it is there?
How often is it read and written?
In what order is it accessed?
What form does it get transformed into?
What data moves together, and what data should be kept separate?

Personal note: If OOP asks "what nouns exist in this world?", data-oriented programming asks "so what exactly is sitting in memory, how many of them are there, and how are they laid out?"

Object-oriented programming describes the world in a way that is easy for humans to understand, while data-oriented programming redescribes that same world in a way that is easy for the CPU to understand.

We write programs for people, yet in the end we have to keep the CPU happy too.

All of this contradiction stems from the fact that at runtime, it is the CPU that shows up on the battlefield, not a domain expert.

Why It Matters

A lot of code starts out looking like this: a tidy object model.

Games, in particular, tend to follow this pattern.

TXT

Player
Enemy
Bullet
Item
Particle

But at actual runtime, questions like these often become far more important.

TXT

How many bullets are there? Tens of thousands?
Does each frame only need to update position?
Which fields are actually needed for collision detection?
Which data is actually needed for rendering?
Is this data laid out in contiguous memory?

When a single OOP object holds position, velocity, rendering info, sound, network state, and behavior state all together, it reads naturally. But when a function that only needs position and velocity iterates over that array, the CPU is forced to drag along all the unnecessary surrounding data as well.

For instance, you might get stutters from particle counts when adding a VFX, or you end up optimizing whenever you add animations. There are all sorts of problems like these, and while people try everything from object pooling on up, there are limits.

Data-oriented thinking aims to eliminate that waste.

TXT

Object-centric:
A single object holds multiple responsibilities and data together.

Data-centric:
Keep data that is processed together, together,
and separate data that is not processed together.

DOD: Performance-Focused Data-Oriented Design

In game engines, rendering, physics, simulation, and high-volume event processing, the same operation is applied repeatedly to large amounts of data. Mike Acton's talk "Data-Oriented Design and C++" is the seminal example that brought this way of thinking to wide attention in the C++ and game-engine community.¹

TXT

Update 100,000 positions
Check collisions for 50,000 bullets
Decrease lifetime for 1,000,000 particles
Evaluate 20,000 NPC states

The key question here is not "did we model the objects beautifully?" but rather "how predictably and sequentially does the CPU read the data?"

AoS vs SoA

The traditional array of objects tends to resemble AoS (Array of Structures). Human intuition produces AoS, while the CPU's instinct calls for SoA (Structure of Arrays). Traditional object arrays (OOP) are usually closer to AoS.

C++

struct Particle {
    float x;
    float y;
    float vx;
    float vy;
    float life;
    int color;
};

Particle particles[100000];

This layout is comfortable for human eyes that perceive a particle as a single whole. But when a function that updates positions every frame iterates over this array, the CPU fumes. Even though position updates need neither color nor life, those unnecessary fields are forcibly loaded alongside the useful ones during a cache line fill.

That is exactly cache pollution. Data-Oriented Design, by contrast, reorganizes the world into SoA (Structure of Arrays).

C++

struct Particles {
    float x[100000];
    float y[100000];
    float vx[100000];
    float vy[100000];
    float life[100000];
    int color[100000];
};

Now the position-update system walks only the trajectory arrays it actually needs, in memory order, in a contiguous sequential sweep.

C++

for (int i = 0; i < count; i++) {
    x[i] += vx[i] * dt;
    y[i] += vy[i] * dt;
}

The proportion of values genuinely needed for the computation within each cache line the CPU fetches goes up, meaning more useful work can be done with the same memory bandwidth.

The point is not that "AoS is bad." There is no single universally correct data layout; what matters is that you can flip the shape of an array at any time to match which data a particular job or system reads, and how often it reads it.

Personal note: The reason people of old referred to ships and cars with feminine pronouns was that the equipment was temperamental. In that sense, the computer is fully entitled to be called "she" as well.

A modern CPU does not fetch just one byte from memory. It brings along the surrounding data in cache line units.

That is why code that reads a contiguous array sequentially is fast.

TXT

Good approach:
x[0], x[1], x[2], x[3] ...

Bad approach:
objectA.position
objectQ.position
objectM.position
objectZ.position

The second approach may look clean in code, but from the CPU's perspective it is jumping all over the place.

The essence of Data-Oriented Design is not a single cache trick. Richard Fabian explains that data-oriented design goes beyond avoiding cache misses: it is an approach that also considers the type, frequency, quantity, shape, and probability of data.[10]

TXT

type: 어떤 종류의 데이터인가?
frequency: 얼마나 자주 발생하는가?
quantity: 얼마나 많은가?
shape: 어떤 구조인가?
probability: 어떤 값/분기가 얼마나 자주 나타나는가?

In other words, data-oriented thinking is broader than "data structure optimization." It treats the actual distribution and usage patterns of real data as the starting point of design.

Hot Data and Cold Data

Data that is accessed frequently is called hot data, while data that is accessed rarely is called cold data.

Consider a game character with data like this.

TXT

position
velocity
health
name
description
inventory
questHistory
lastDialogueText

What is typically needed every frame is position, velocity, and health. By contrast, name, description, questHistory, and lastDialogueText are only needed when the UI is opened or a dialogue event fires. Keeping both in the same object means the hot loop can drag cold data along with it. Even though only position and velocity are needed for the actual computation, the object's memory layout can cause infrequently used fields like name, description, and inventory to end up near the same cache lines. Data-Oriented Design therefore separates data by access frequency.

TXT

hot:
- position
- velocity
- health

cold:
- name
- description
- questHistory
- dialogue state

The key is keeping hot data small and contiguous. Data read by loops that run every frame should be packed as densely as possible, while infrequently needed data is split into a separate structure. Instead of bundling a character into "one realistic object," you divide it into multiple data sets that match the actual access patterns at runtime.

Personal note: The real reason the computer deserves to be called "she" is that she absolutely hates having someone dig through her past conversation logs (lastDialogueText) and the contents of her bag (inventory) at all hours of the day.

Ask her cleanly where she is going (position) and how fast (velocity) and she will answer at blazing speed, but the moment you sneak in garbage data like "hey, do you remember the quest from when we first met? (questHistory)" you get blocked on the spot and the runtime freezes.

The safe approach is to keep the essential daily talking points (Hot) compact and ready to hand, while the messy historical records (Cold) are strictly isolated on a password-locked external drive of their own.

Perhaps the programmer who has mastered DOP is a Casanova at heart.

ECS and Data-Oriented Design

Entity Component System (ECS) is a common form in which data-oriented design is implemented, though ECS itself is not the entirety of data-oriented design. ECS avoids hiding data inside objects; instead, it separates data into component units and lets systems process the same data combinations in batches.

TXT

Entity:
ID. Usually just a number. The entity itself holds almost no logic or data.

Component:
Data. Pure data bundles such as Position, Velocity, and Health.

System:
Logic. Processes in batch the entities that have a specific combination of components.

Example:

TXT

MovementSystem:
Find all entities with Position + Velocity and
perform position += velocity * dt

In OOP, Player.update(), Enemy.update(), and Bullet.update() each mutate their own internal state.

In ECS, by contrast, MovementSystem collects everything that can move into the same data shape and processes it all in a single batch.

TXT

Object-centric:
- Each object performs its own update.
- Logic is scattered inside individual objects.
- The same operation is spread across multiple types.

ECS-centric:
- Systems batch-process the same set of components.
- Data and logic are separated.
- The same operation is applied sequentially to data of the same shape.

Unity's DOTS (Data-Oriented Technology Stack) is a leading industry example that supports data-oriented design centered around ECS.²

Personal note: Unity waves ECS as its headline feature and pitches "the paradigm is now changing," but 99% of the Asset Store is MonoBehaviour-based, and when you mix them in the name of a "hybrid" approach, you get a bizarre monster.

You start wondering whether the goal is to make a game or to have a staring contest with the compiler. The trouble is, the day I actually win that contest probably will not come in my lifetime.

DOP: Immutable Data-Centered Programming

The Data-Oriented Programming that Sharvit lays out differs somewhat from the game-engine-style DOD. The central problem here is not CPU cache but the complexity of information systems. The Manning book description also explains DOP as a paradigm that simplifies state management with "immutable generic data structures" and "non-mutating general-purpose functions."³

The four core principles are as follows.⁴

TXT

1. Separate code from data.
2. Represent data using generic data structures.
3. Do not mutate data directly.
4. Separate the data schema from the data representation.

An additional practical principle follows from these: manipulate data with general-purpose functions. This means favoring generic operations like map, filter, reduce, pick, merge, assoc, update, and groupBy over class-specific methods.

TXT

The OOP question:
"What methods should a Book object have?"

The DOP question:
"What shape is the Book data,
and what pure transformation functions operate on that shape?"

OOP approach:

TypeScript

class Book {
    public constructor(
        public title: string,
        public author: string,
        public checkedOut: boolean,
    ) {}

    public checkout(): void {
        this.checkedOut = true;
    }
}

DOP approach:

Personal note: Slather readonly everywhere in your code. It is not only the most cost-effective way to look like a seasoned developer with just a few extra keystrokes; it is also a physical restraint that physically prevents future-you from contaminating global state five minutes before a delivery and blowing up the runtime.

TypeScript

type Book = {
    readonly id: string;
    readonly title: string;
    readonly author: string;
    readonly checkedOut: boolean;
};

type BookView = {
    readonly title: string;
    readonly status: "available" | "checkedOut";
};

const checkoutBook = (book: Book): Book => {
    return {
        ...book,
        checkedOut: true,
    };
};

const toBookView = (book: Book): BookView => {
    return {
        title: book.title,
        status: book.checkedOut ? "checkedOut" : "available",
    };
};

// Externally it is 100% pure functions, but internally it uses local mutation to conserve memory.
// This is the difference between academic FP and pragmatic DOP.
const groupBooksByAuthor = (
    books: readonly Book[],
): ReadonlyMap<string, readonly Book[]> => {
    const groups = new Map<string, Book[]>(); // Allowing internal temporary mutation

    for (const book of books) {
        let current = groups.get(book.author);
        if (!current) {
            current = [];
            groups.set(book.author, current);
        }
        current.push(book); // Pushing data in without re-creating the array
    }

    return groups as ReadonlyMap<string, readonly Book[]>; // Sealing it as immutable (Readonly) on the way out before returning
};

This approach resembles functional programming closely. DOP, however, focuses less on foregrounding all of functional programming's abstractions (monads, typeclasses, higher-kinded types, and the like) and more on keeping data representations simple and transforming that data with non-mutating functions.

Put differently, separating the schema from the data representation means not insisting that data must be tied to a specific class constructor or its methods. Data can be represented as a plain map or object, and whether that data is valid can be verified by a separate schema or validator.

DOP Principle 1: Separate Code from Data

In DOP, data does not carry methods. Data is data, and behavior is a function.

TypeScript

// Data
type Member = {
  id: string;
  name: string;
  borrowedBookIds: string[];
};

// Behavior
function canBorrow(member: Member): boolean {
  return member.borrowedBookIds.length < 5;
}

Sharvit argues that structures that mix code and data tend to be more complex, while structures that separate the two tend to be composed of simpler parts.⁵

From this perspective, the fact that objects bundle "data + methods" together is both an advantage and a potential source of tight coupling. When an object's internal state is strongly bound to its methods, it can become difficult to reuse, record, compare, or transmit that data in other contexts.

DOP aims to keep data as plain values and to separate behavior into functions that take those values as input and return new values.

Benefits:

TXT

Functions are easy to test in isolation.
The same data is easy to reuse across multiple contexts.
Serialization, logging, diff, and replay become straightforward.

Costs:

TXT

Control over which functions access which data becomes weaker.
Unlike objects, discoverability through a list of methods is harder.
Separating data from functions can increase the number of files and modules.

Personal note: DOP is not saying "encapsulation is unnecessary." It is closer to asking "does encapsulation absolutely require locking all data inside an object?"

But even this profound reflection turns out to be completely useless philosophical self-indulgence the moment a project manager says, "Hey, the deadline is tomorrow, just make it public and wire it up fast." Philosophy is a luxury reserved for those who have already shipped.

DOP Principle 2: Represent Data with Generic Data Structures

DOP prefers to represent domain data using general-purpose data structures such as maps, dictionaries, objects, arrays, and lists rather than dedicated classes.⁶

TypeScript

const book = {
  id: "book-1",
  title: "Data-Oriented Programming",
  author: "Yehonathan Sharvit",
  tags: ["programming", "architecture"],
};

The point is not "let types be a free-for-all." It is to ride data representations on the general-purpose data operations that the language and its ecosystem already provide.

TypeScript

const publicBookView = {
  title: book.title,
  author: book.author,
};

const serialized = JSON.stringify(publicBookView);

The more dedicated classes you accumulate, the more bespoke conversion code each one demands. Conversely, when data is a plain object or map, operations like serialization, partial selection, merging, comparison, and diffing are straightforward to handle with generic functions.

TXT

Class-centric:
Book.toJson()
Author.toJson()
Member.toJson()
Loan.toJson()

Generic data-centric:
JSON.stringify(data)
pick(data, keys)
merge(dataA, dataB)
diff(before, after)

There are costs as well.

TXT

Typos in field names can lurk undetected until runtime.
You get less help from IDE autocompletion and static typing.
Accessing data through generic structures can be slower than direct class/struct field access.

In languages like TypeScript, therefore, a practical compromise is to annotate plain data shapes with types rather than reaching for untyped objects to implement DOP.

TypeScript

type BookData = {
  id: string;
  title: string;
  author: string;
  tags: string[];
};

DOP Principle 3: Treat Data as Immutable Values

In DOP, data is a value. The value itself does not change; mutations are expressed by creating a new version instead.⁷

TypeScript

const before = {
  id: "book-1",
  checkedOut: false,
};

const after = {
  ...before,
  checkedOut: true,
};

An important distinction:

TXT

Data values do not change.
A variable can change to point to a new data value.

This distinction is the same as the immutability concept in functional programming. In practice, the following benefits are significant.

TXT

It is easy to compare the previous state with the new state.
Undo/redo, event replay, and audit logging become straightforward.
Shared mutable state problems in concurrent scenarios are reduced.
In testing, inputs and outputs are easy to verify through value comparison.

Example:

TypeScript

function checkoutBook(state: LibraryState, memberId: string, bookId: string): LibraryState {
  return {
    ...state,
    loans: [
      ...state.loans,
      { memberId, bookId, checkedOutAt: new Date().toISOString() },
    ],
    books: state.books.map(book =>
      book.id === bookId ? { ...book, checkedOut: true } : book
    ),
  };
}

This code does not secretly mutate the object's internal state. The input state and the output LibraryState are explicit. That said, with large data sets you need to think about the cost of shallow versus deep copying and whether to use a structural-sharing library. This is where persistent data structures come into the picture.

DOP Principle 4: Separate Schema from Data Representation

Because DOP represents data as generic structures, it does not tie the shape of the data to a class definition. Instead, the schema is kept separately.⁸

TypeScript

const addBookRequestSchema = {
  type: "object",
  required: ["title", "author"],
  properties: {
    title: { type: "string" },
    author: { type: "string" },
    tags: {
      type: "array",
      items: { type: "string" },
    },
  },
};

This approach integrates well with tools like JSON Schema.⁹

TXT

데이터:
{ "title": "DOP", "author": "Sharvit" }

schema:
title은 필수 string
author는 필수 string
tags는 선택 array<string>

Benefits:

TXT

Validation of external request/response data becomes clearer.
The schema can be reused for runtime validation, documentation, and test data generation.
During the exploration phase, you can attach the schema late, then tighten it once things stabilize.

Costs:

TXT

The connection between data and schema is looser than with classes.
If schema validation is skipped, runtime errors are caught late.
Using static types alongside runtime schema leads to a duplicate management problem.

In TypeScript, you typically have the following options.

TXT

1. TypeScript types only
   This works well at compile time, but runtime validation of external inputs is weak.

2. Runtime schema using JSON Schema/Zod/io-ts
   This handles external input validation well, but introduces schema management overhead.

3. Synchronizing types and schemas with a code generation tool
   This is the most robust approach, but it makes the build pipeline more complex.

Personal note: DOP's "schema and data separation" is an ideal. Data is free, schemas are optional, validation is explicit, and developers enjoy philosophical peace.

Then you sit in your room drinking coffee and realize the deadline is three days away. The moment that sinks in, any gets plastered everywhere and DOP becomes DROP.

Schema separation is a fine idea, but it falls apart under a deadline.

How Does DOP Handle Polymorphism?

In OOP, polymorphism is expressed primarily through class/interface hierarchies and method dispatch.

TypeScript

interface Shape {
  area(): number;
}

In DOP, data carries a kind or type field, and ordinary functions branch on it or use a dispatch table.

TypeScript

type Circle = {
  kind: "circle";
  radius: number;
};

type Rectangle = {
  kind: "rectangle";
  width: number;
  height: number;
};

type Shape = Circle | Rectangle;

function area(shape: Shape): number {
  switch (shape.kind) {
    case "circle":
      return Math.PI * shape.radius * shape.radius;
    case "rectangle":
      return shape.width * shape.height;
  }
}

This approach resembles Algebraic Data Types (ADTs). TypeScript's discriminated unions, Rust's enums, F#'s discriminated unions, and Haskell's ADTs all support this pattern strongly.

From a DOP perspective, polymorphism is achievable without objects. The Manning book description also lists "polymorphism without objects" as one of the learning topics.[3]

Personal note: In school we were taught that switch statements are a bad habit to avoid, and that the elegant solution is inheritance and overriding. But after debugging a few of those so-called "elegant Java enterprise projects" twisted through dozens of layers of inheritance, you find yourself ready to part with OOP-style inheritance forever. What I find oddly comforting is that since I have little to inherit from my parents in real life either, I feel absolutely no attachment to "inheritance" in code.

Where DOP Fits Best

Sharvit-style DOP is particularly well-suited to information systems, that is, systems where moving and transforming data matters more than CPU cache optimization.

TXT

REST/GraphQL API
JSON request/response handling
Frontend application state
ETL / data pipeline
event enrichment
workflow state
Configuration file/policy file processing
Audit log/audit trail

In these systems, data already flows in the form of JSON, maps, records, and tables. Rather than forcing it into a deep class hierarchy, it is often simpler to define the data shapes and transformation functions explicitly.

Where DOP Gets Dangerous

DOP reduces the problems of OOP, but it introduces its own.

TXT

Field names scattered as string keys make the code vulnerable to typos.
It can become difficult to track which function expects which data shape.
Without schema validation, the result is a "loose bag of maps."
Failing to understand the cost of immutable updates leads to performance problems.
The party responsible for enforcing domain invariants can disappear entirely.

Practical DOP therefore usually needs to be used alongside the following mechanisms.

TXT

Schemas like TypeScript types, JSON Schema, or Zod
pure function-centric testing
single path for state mutations
domain events or command handlers
diff, snapshot, audit log

Personal note: Misuse DOP and you escape the "complex maze of object-oriented code" only to fall into the "JSON swamp where anyone can touch anything."

Even when you gain freedom, you eventually need new constraints. Too much freedom is never a good thing.

Differences from OOP

TXT

OOP:
Data and behavior are kept together inside objects.
Methods protect the internal invariants of objects.
Objects collaborate through message/method calls.

Data-Oriented:
Data and behavior are separated.
The shape and flow of data are considered first.
Data of the same shape is processed in batch.

OOP is not inherently bad. For things like external resources, drivers, file handles, and network connections, where lifetime and invariants need to be managed strictly, objects are a natural fit. Richard Fabian also notes that OOP can be the better choice for large, stable abstractions like file-system handles or graphics APIs.¹⁰

The problems arise when everything is started as an object.

TXT

Objects hold too much data.
Inheritance hierarchies obscure data flow.
Memory is laid out by conceptual unit rather than by unit of work.
Cache locality suffers when the same operation is performed across many objects.

Data-oriented thinking critiques exactly this point.

Relationship to Functional Programming

Data-Oriented Programming overlaps considerably with functional programming.

TXT

Commonalities:
- Data is not mutated directly.
- Transformation functions are central.
- Inputs and outputs are made explicit.
- Reducing state mutation makes testing easier.

Differences:
- Functional programming places greater emphasis on purity, composition, types, and side-effect control.
- Data-Oriented Programming places greater emphasis on the shape, volume, flow, layout, and access patterns of data.

Simply put:

TXT

Functional: Is this computation pure?
Data-Oriented: What shape does this data take as it flows?
Performance DOD: How is this data read from memory?

Advantages

TXT

1. Bulk data processing performance can improve.
2. Data flow becomes clearer.
3. Unnecessary object graphs can be reduced.
4. Batch processing and parallelization become easier.
5. Serialization, logging, replay, and testing become simpler.
6. It becomes easier to separate business data from transformation logic.

It is especially powerful in domains like games, simulation, rendering, physics, data pipelines, and event processing, where the same operation is applied repeatedly to large volumes of data.

Disadvantages

TXT

1. The design looks less like "real-world nouns."
2. It can be over-engineering for small programs.
3. Separating data from logic can make it harder to trace behavior.
4. Used carelessly, it devolves into global data tables and blobs of procedural code.
5. The Immutable Data approach requires careful attention to allocation and copying costs.
6. Performance-focused DOD requires a solid understanding of CPU, cache, and memory models.

Personal note: Right after learning about data-oriented design, you will want to tear every class apart into arrays. It is better to resist that urge for a moment.

Not every program is a game engine, and not every object is a criminal.

When to Use It

TXT

You process large volumes of similar data.
You repeatedly perform the same operations.
Performance bottlenecks arise from memory access.
The object graph is too complex to track state effectively.
Serialization, persistence, transmission, and replay matter.
You want to compare input data with output data in tests.

When to Be Careful

TXT

The data volume is small.
There are no performance bottlenecks.
Objects must strictly enforce domain invariants.
Managing the lifetime of external resources is the core concern.
The team is unfamiliar with memory layout or batch processing.
It is a business application where simple CRUD matters more than abstraction.

This does not mean data-oriented thinking is useless in business applications. Applying game-engine-style DOD directly, however, is overkill. In business applications, Sharvit-style DOP, separating data from transformations and reducing mutation, is often far more practical.

Anti-Patterns

1. Tearing Everything into Arrays

Data-oriented design does not mean always using SoA. If a job unit frequently needs entire objects, AoS is the better choice.

TXT

Question:
What fields does this operation actually read?
How often are those fields read together?

Restructuring without asking that question just produces code that is hard to read.

2. Failing to Separate Data from Invariants

Separating data from logic is good, but scattering invariants is dangerous.

TXT

Bad example:
Multiple systems modify `health` arbitrarily.
It is impossible to tell where it drops to 0 or below.
Death handling, UI updates, and event dispatch fall out of sync with each other.

Even in data-oriented design, ownership of invariants is necessary.

TXT

Good example:
`DamageSystem` is solely responsible for reducing health.
`DeathSystem` handles the `health <= 0` state.
The `HealthChanged` event is published explicitly.

3. Optimizing Without Measuring

Data-Oriented Design emphasizes actual data and access patterns. Changing structure based solely on a hunch that it will be "cache-friendly," without any measurement, turns design into a game.

TXT

Measure first:
- Number of data items
- Access frequency
- hot path
- cache miss
- allocation
- branch miss
- frame time / latency

4. Rejecting OOP Unconditionally

OOP is strong at expressing invariants and lifetime management. Data-oriented design is strong at bulk processing and data flow.

TXT

Where objects shine:
File handles, DB connections, transactions, UI widgets, external API clients

Where data-oriented approaches shine:
particle, transform, physics body, telemetry event, order rows, batch job

There is no need to use only one; use both together.

Just as our parents told us to get along well with our friends.

5. Overusing Sharvit-Style DOP (Immutability) in Performance-Critical DOD Contexts

Just because they both carry the data-oriented label does not mean DOD and DOP are perfectly interchangeable.

Sharvit-style DOP's core practice of immutable data necessarily involves spread operators (...spread) or allocating new objects. If you produce a fresh immutable object every time you update state inside a game engine's hot loop that runs tens of thousands of times per frame (the territory of DOD), the garbage collector will scream and halt the runtime before you ever get around to optimizing the CPU cache.

Code

Information systems (DOP): willingly pay the overhead of immutability and GC in order to tame the complexity of state.

High-performance loops (DOD): to block GC involvement at the source, data is overwritten in-place without mercy within pre-allocated contiguous memory (arrays), mutating data directly where it lives (In-place Mutation).

If you blindly mix the two paradigms without clearly distinguishing their target bottlenecks (cognitive load vs. hardware limits), you end up with a horrifying chimera that is neither maintainable nor performant.

Practical Checklist

When approaching a problem with a data-oriented lens, start by writing down the following.

TXT

[What]
What data does this system transform?

[Why]
Why should you look at data flow before the object model?

[Shape]
What is the shape of the data? Is it rows, a tree, a graph, or an event stream?

[Volume]
How many data items are there? Ten? A hundred thousand? Ten million?

[Frequency]
How often is it read and written?

[Hot Path]
What is the most frequently executed loop?

[Invariant]
What are the rules that must never be broken?

[Layout]
Is data that is read together kept together?
Is data that is not read together kept separate?

[Next]
If the measurements change, which structures will you change?

Personal note: The biggest problem with checklists is that you never actually use them. On the next project you always forget and start over. That is the destiny of a checklist.

Final Summary

Data-Oriented Programming is not a movement against objects.

Personal note: Of course, for some people it absolutely is a movement against objects.

The core idea is this.

TXT

Place the program's focus on real data and its transformation flow, rather than abstract objects.

In performance-focused DOD, the following matter most.

TXT

Data Volume
Access Frequency
Memory Layout
cache locality
batch processing
hot/cold split

In complexity-focused DOP, the following matter most.

TXT

Code/Data Separation
generic data structure
immutable data
non-mutating function
schema

A one-sentence definition of data-oriented programming can be written like this.

TXT

Data-Oriented Programming is
a programming approach that, rather than asking "what exists,"
first asks "what data is being transformed, in what form, and how frequently."

Personal note: That said, the kind of engineer I aspire to be is not a zealot of any particular paradigm.

In the face of the cold reality of business requirements and runtime constraints (available memory, CPU cache, deadline), I should be a mercenary willing to bend principles in order to find the best compromise.

Footnotes

Mike Acton. Data-Oriented Design and C++. CppCon 2014 presentation. This landmark talk popularized Data-Oriented Design (DOD) discussions in the C++ and game engine communities. ↩
Unity. Introduction to the Data-Oriented Technology Stack. Unity DOTS is a representative example of applying Data-Oriented Design centered on the Entity Component System (ECS). ↩
Yehonathan Sharvit. *Data-Oriented Programming*. Manning, 2022. Manning's introduction explains Data-Oriented Programming (DOP) through immutable generic data structures and non-mutating general-purpose functions. ↩
Yehonathan Sharvit. "Principles of Data-Oriented Programming". An article outlining the four principles of Data-Oriented Programming: separation of code from data, generic data structures, immutable data, and separation of schema from representation. ↩
Yehonathan Sharvit. "Separate code from data". Data-Oriented Programming principle 1. Explains the benefits and costs of separating code from data. ↩
Yehonathan Sharvit. "Represent data with generic data structures". Data-Oriented Programming principle 2. Explains generic data structures like maps and arrays, and their trade-offs. ↩
Yehonathan Sharvit. "Data is immutable". Data-Oriented Programming principle 3. Explains the practice of creating new versions of data instead of mutating existing data. ↩
Yehonathan Sharvit. "Separate data schema from data representation". Data-Oriented Programming principle 4. Explains why it is important to separate data representation from schema. ↩
JSON Schema. Official website. A vocabulary for expressing the shape and validation rules of JSON data. ↩
Richard Fabian. *Data-Oriented Design*. Fabian explains that Data-Oriented Design is not merely a matter of cache misses but rather an approach that considers the type, frequency, quantity, shape, and probability of data. ↩

→ 현재 버전 보기

Data-Oriented Programming

Data-Oriented Programming

Why It Matters

DOD: Performance-Focused Data-Oriented Design

AoS vs SoA

Hot Data and Cold Data

ECS and Data-Oriented Design

DOP: Immutable Data-Centered Programming

DOP Principle 1: Separate Code from Data

DOP Principle 2: Represent Data with Generic Data Structures

DOP Principle 3: Treat Data as Immutable Values

DOP Principle 4: Separate Schema from Data Representation

How Does DOP Handle Polymorphism?

Where DOP Fits Best

Where DOP Gets Dangerous

Differences from OOP

Relationship to Functional Programming

Advantages

Disadvantages

When to Use It

When to Be Careful

Anti-Patterns

1. Tearing Everything into Arrays

2. Failing to Separate Data from Invariants

3. Optimizing Without Measuring

4. Rejecting OOP Unconditionally

5. Overusing Sharvit-Style DOP (Immutability) in Performance-Critical DOD Contexts

Practical Checklist

Final Summary

See Also

Footnotes