Data-Oriented Programming
Data-Oriented Programming
Data-Oriented Programming (DOP) is a way of thinking about programs that prioritizes the shape, flow, transformation, and access patterns of data over class and object hierarchies.
However, this term is used in two distinct ways.
1. Data-Oriented Design (DOD)
Data-Oriented Design in the Game/Engine/Performance Optimization Camp
Core concepts: CPU cache, memory layout, batch processing, Entity Component System (ECS), hot/cold split
2. Data-Oriented Programming (DOP)
Immutable-data-centric programming as organized by Yehonathan Sharvit
Core concepts: separation of code and data, Generic Data Structures, Immutable Data, pure transformation
The two share common ground, though they differ in emphasis.
A program = a transformer that converts data from one form to anotherIf object-oriented programming asks "which objects exchange messages with which," data-oriented programming starts by asking something different.
What data exists?
How much of it is there?
How often is it read and written?
In what order is it accessed?
What form does it get transformed into?
What data moves together, and what data should be kept separate?Personal note: If OOP asks "what nouns exist in this world?", data-oriented programming asks "so what exactly is sitting in memory, how many of them are there, and how are they laid out?"
Object-oriented programming describes the world in a way that is easy for humans to understand, while data-oriented programming redescribes that same world in a way that is easy for the CPU to understand.
We write programs for people, yet in the end we have to keep the CPU happy too.
All of this contradiction stems from the fact that at runtime, it is the CPU that shows up on the battlefield, not a domain expert.
Why It Matters
A lot of code starts out looking like this: a tidy object model.
Games, in particular, tend to follow this pattern.
Player
Enemy
Bullet
Item
ParticleBut at actual runtime, questions like these often become far more important.
How many bullets are there? Tens of thousands?
Does each frame only need to update position?
Which fields are actually needed for collision detection?
Which data is actually needed for rendering?
Is this data laid out in contiguous memory?When a single OOP object holds position, velocity, rendering info, sound, network state, and behavior state all together, it reads naturally. But when a function that only needs position and velocity iterates over that array, the CPU is forced to drag along all the unnecessary surrounding data as well.
For instance, you might get stutters from particle counts when adding a VFX, or you end up optimizing whenever you add animations. There are all sorts of problems like these, and while people try everything from object pooling on up, there are limits.
Data-oriented thinking aims to eliminate that waste.
Object-centric:
A single object holds multiple responsibilities and data together.
Data-centric:
Keep data that is processed together, together,
and separate data that is not processed together.DOD: Performance-Focused Data-Oriented Design
In game engines, rendering, physics, simulation, and high-volume event processing, the same operation is applied repeatedly to large amounts of data. Mike Acton's talk "Data-Oriented Design and C++" is the seminal example that brought this way of thinking to wide attention in the C++ and game-engine community.1
Update 100,000 positions
Check collisions for 50,000 bullets
Decrease lifetime for 1,000,000 particles
Evaluate 20,000 NPC states
The key question here is not "did we model the objects beautifully?" but rather "how predictably and sequentially does the CPU read the data?"
AoS vs SoA
The traditional array of objects tends to resemble AoS (Array of Structures). Human intuition produces AoS, while the CPU's instinct calls for SoA (Structure of Arrays). Traditional object arrays (OOP) are usually closer to AoS.
struct Particle {
float x;
float y;
float vx;
float vy;
float life;
int color;
};
Particle particles[100000];
This layout is comfortable for human eyes that perceive a particle as a single whole. But when a function that updates positions every frame iterates over this array, the CPU fumes. Even though position updates need neither color nor life, those unnecessary fields are forcibly loaded alongside the useful ones during a cache line fill.
That is exactly cache pollution. Data-Oriented Design, by contrast, reorganizes the world into SoA (Structure of Arrays).
struct Particles {
float x[100000];
float y[100000];
float vx[100000];
float vy[100000];
float life[100000];
int color[100000];
};
Now the position-update system walks only the trajectory arrays it actually needs, in memory order, in a contiguous sequential sweep.
for (int i = 0; i < count; i++) {
x[i] += vx[i] * dt;
y[i] += vy[i] * dt;
}
The proportion of values genuinely needed for the computation within each cache line the CPU fetches goes up, meaning more useful work can be done with the same memory bandwidth.
The point is not that "AoS is bad." There is no single universally correct data layout; what matters is that you can flip the shape of an array at any time to match which data a particular job or system reads, and how often it reads it.
Personal note: The reason people of old referred to ships and cars with feminine pronouns was that the equipment was temperamental. In that sense, the computer is fully entitled to be called "she" as well.
A modern CPU does not fetch just one byte from memory. It brings along the surrounding data in cache line units.
That is why code that reads a contiguous array sequentially is fast.
Good approach:
x[0], x[1], x[2], x[3] ...
Bad approach:
objectA.position
objectQ.position
objectM.position
objectZ.position
The second approach may look clean in code, but from the CPU's perspective it is jumping all over the place.
The essence of Data-Oriented Design is not a single cache trick. Richard Fabian explains that data-oriented design goes beyond avoiding cache misses: it is an approach that also considers the type, frequency, quantity, shape, and probability of data.[10]
type: ์ด๋ค ์ข
๋ฅ์ ๋ฐ์ดํฐ์ธ๊ฐ?
frequency: ์ผ๋ง๋ ์์ฃผ ๋ฐ์ํ๋๊ฐ?
quantity: ์ผ๋ง๋ ๋ง์๊ฐ?
shape: ์ด๋ค ๊ตฌ์กฐ์ธ๊ฐ?
probability: ์ด๋ค ๊ฐ/๋ถ๊ธฐ๊ฐ ์ผ๋ง๋ ์์ฃผ ๋ํ๋๋๊ฐ?
In other words, data-oriented thinking is broader than "data structure optimization." It treats the actual distribution and usage patterns of real data as the starting point of design.
Hot Data and Cold Data
Data that is accessed frequently is called hot data, while data that is accessed rarely is called cold data.
Consider a game character with data like this.
position
velocity
health
name
description
inventory
questHistory
lastDialogueText
What is typically needed every frame is position, velocity, and health. By contrast, name, description, questHistory, and lastDialogueText are only needed when the UI is opened or a dialogue event fires. Keeping both in the same object means the hot loop can drag cold data along with it. Even though only position and velocity are needed for the actual computation, the object's memory layout can cause infrequently used fields like name, description, and inventory to end up near the same cache lines. Data-Oriented Design therefore separates data by access frequency.
hot:
- position
- velocity
- health
cold:
- name
- description
- questHistory
- dialogue state
The key is keeping hot data small and contiguous. Data read by loops that run every frame should be packed as densely as possible, while infrequently needed data is split into a separate structure. Instead of bundling a character into "one realistic object," you divide it into multiple data sets that match the actual access patterns at runtime.
Personal note: The real reason the computer deserves to be called "she" is that she absolutely hates having someone dig through her past conversation logs (
lastDialogueText) and the contents of her bag (inventory) at all hours of the day.Ask her cleanly where she is going (
position) and how fast (velocity) and she will answer at blazing speed, but the moment you sneak in garbage data like "hey, do you remember the quest from when we first met? (questHistory)" you get blocked on the spot and the runtime freezes.The safe approach is to keep the essential daily talking points (Hot) compact and ready to hand, while the messy historical records (Cold) are strictly isolated on a password-locked external drive of their own.
Perhaps the programmer who has mastered DOP is a Casanova at heart.
ECS and Data-Oriented Design
Entity Component System (ECS) is a common form in which data-oriented design is implemented, though ECS itself is not the entirety of data-oriented design. ECS avoids hiding data inside objects; instead, it separates data into component units and lets systems process the same data combinations in batches.
Entity:
ID. Usually just a number. The entity itself holds almost no logic or data.
Component:
Data. Pure data bundles such as Position, Velocity, and Health.
System:
Logic. Processes in batch the entities that have a specific combination of components.
Example:
MovementSystem:
Find all entities with Position + Velocity and
perform position += velocity * dt
In OOP, Player.update(), Enemy.update(), and Bullet.update() each mutate their own internal state.
In ECS, by contrast, MovementSystem collects everything that can move into the same data shape and processes it all in a single batch.
Object-centric:
- Each object performs its own update.
- Logic is scattered inside individual objects.
- The same operation is spread across multiple types.
ECS-centric:
- Systems batch-process the same set of components.
- Data and logic are separated.
- The same operation is applied sequentially to data of the same shape.
Unity's DOTS (Data-Oriented Technology Stack) is a leading industry example that supports data-oriented design centered around ECS.2
Personal note: Unity waves ECS as its headline feature and pitches "the paradigm is now changing," but 99% of the Asset Store is MonoBehaviour-based, and when you mix them in the name of a "hybrid" approach, you get a bizarre monster.
You start wondering whether the goal is to make a game or to have a staring contest with the compiler. The trouble is, the day I actually win that contest probably will not come in my lifetime.
DOP: Immutable Data-Centered Programming
The Data-Oriented Programming that Sharvit lays out differs somewhat from the game-engine-style DOD. The central problem here is not CPU cache but the complexity of information systems. The Manning book description also explains DOP as a paradigm that simplifies state management with "immutable generic data structures" and "non-mutating general-purpose functions."3
The four core principles are as follows.4
1. Separate code from data.
2. Represent data using generic data structures.
3. Do not mutate data directly.
4. Separate the data schema from the data representation.
An additional practical principle follows from these: manipulate data with general-purpose functions. This means favoring generic operations like map, filter, reduce, pick, merge, assoc, update, and groupBy over class-specific methods.
The OOP question:
"What methods should a Book object have?"
The DOP question:
"What shape is the Book data,
and what pure transformation functions operate on that shape?"
OOP approach:
class Book {
public constructor(
public title: string,
public author: string,
public checkedOut: boolean,
) {}
public checkout(): void {
this.checkedOut = true;
}
}DOP approach:
Personal note: Slather
readonlyeverywhere in your code. It is not only the most cost-effective way to look like a seasoned developer with just a few extra keystrokes; it is also a physical restraint that physically prevents future-you from contaminating global state five minutes before a delivery and blowing up the runtime.
type Book = {
readonly id: string;
readonly title: string;
readonly author: string;
readonly checkedOut: boolean;
};
type BookView = {
readonly title: string;
readonly status: "available" | "checkedOut";
};
const checkoutBook = (book: Book): Book => {
return {
...book,
checkedOut: true,
};
};
const toBookView = (book: Book): BookView => {
return {
title: book.title,
status: book.checkedOut ? "checkedOut" : "available",
};
};
// Externally it is 100% pure functions, but internally it uses local mutation to conserve memory.
// This is the difference between academic FP and pragmatic DOP.
const groupBooksByAuthor = (
books: readonly Book[],
): ReadonlyMap<string, readonly Book[]> => {
const groups = new Map<string, Book[]>(); // Allowing internal temporary mutation
for (const book of books) {
let current = groups.get(book.author);
if (!current) {
current = [];
groups.set(book.author, current);
}
current.push(book); // Pushing data in without re-creating the array
}
return groups as ReadonlyMap<string, readonly Book[]>; // Sealing it as immutable (Readonly) on the way out before returning
};This approach resembles functional programming closely. DOP, however, focuses less on foregrounding all of functional programming's abstractions (monads, typeclasses, higher-kinded types, and the like) and more on keeping data representations simple and transforming that data with non-mutating functions.
Put differently, separating the schema from the data representation means not insisting that data must be tied to a specific class constructor or its methods. Data can be represented as a plain map or object, and whether that data is valid can be verified by a separate schema or validator.
DOP Principle 1: Separate Code from Data
In DOP, data does not carry methods. Data is data, and behavior is a function.
// Data
type Member = {
id: string;
name: string;
borrowedBookIds: string[];
};
// Behavior
function canBorrow(member: Member): boolean {
return member.borrowedBookIds.length < 5;
}
Sharvit argues that structures that mix code and data tend to be more complex, while structures that separate the two tend to be composed of simpler parts.5
From this perspective, the fact that objects bundle "data + methods" together is both an advantage and a potential source of tight coupling. When an object's internal state is strongly bound to its methods, it can become difficult to reuse, record, compare, or transmit that data in other contexts.
DOP aims to keep data as plain values and to separate behavior into functions that take those values as input and return new values.
Benefits:
Functions are easy to test in isolation.
The same data is easy to reuse across multiple contexts.
Serialization, logging, diff, and replay become straightforward.
Costs:
Control over which functions access which data becomes weaker.
Unlike objects, discoverability through a list of methods is harder.
Separating data from functions can increase the number of files and modules.
Personal note: DOP is not saying "encapsulation is unnecessary." It is closer to asking "does encapsulation absolutely require locking all data inside an object?"
But even this profound reflection turns out to be completely useless philosophical self-indulgence the moment a project manager says, "Hey, the deadline is tomorrow, just make it public and wire it up fast." Philosophy is a luxury reserved for those who have already shipped.
DOP Principle 2: Represent Data with Generic Data Structures
DOP prefers to represent domain data using general-purpose data structures such as maps, dictionaries, objects, arrays, and lists rather than dedicated classes.6
const book = {
id: "book-1",
title: "Data-Oriented Programming",
author: "Yehonathan Sharvit",
tags: ["programming", "architecture"],
};
The point is not "let types be a free-for-all." It is to ride data representations on the general-purpose data operations that the language and its ecosystem already provide.
const publicBookView = {
title: book.title,
author: book.author,
};
const serialized = JSON.stringify(publicBookView);
The more dedicated classes you accumulate, the more bespoke conversion code each one demands. Conversely, when data is a plain object or map, operations like serialization, partial selection, merging, comparison, and diffing are straightforward to handle with generic functions.
Class-centric:
Book.toJson()
Author.toJson()
Member.toJson()
Loan.toJson()
Generic data-centric:
JSON.stringify(data)
pick(data, keys)
merge(dataA, dataB)
diff(before, after)
There are costs as well.
Typos in field names can lurk undetected until runtime.
You get less help from IDE autocompletion and static typing.
Accessing data through generic structures can be slower than direct class/struct field access.
In languages like TypeScript, therefore, a practical compromise is to annotate plain data shapes with types rather than reaching for untyped objects to implement DOP.
type BookData = {
id: string;
title: string;
author: string;
tags: string[];
};
DOP Principle 3: Treat Data as Immutable Values
In DOP, data is a value. The value itself does not change; mutations are expressed by creating a new version instead.7
const before = {
id: "book-1",
checkedOut: false,
};
const after = {
...before,
checkedOut: true,
};
An important distinction:
Data values do not change.
A variable can change to point to a new data value.
This distinction is the same as the immutability concept in functional programming. In practice, the following benefits are significant.
It is easy to compare the previous state with the new state.
Undo/redo, event replay, and audit logging become straightforward.
Shared mutable state problems in concurrent scenarios are reduced.
In testing, inputs and outputs are easy to verify through value comparison.
Example:
function checkoutBook(state: LibraryState, memberId: string, bookId: string): LibraryState {
return {
...state,
loans: [
...state.loans,
{ memberId, bookId, checkedOutAt: new Date().toISOString() },
],
books: state.books.map(book =>
book.id === bookId ? { ...book, checkedOut: true } : book
),
};
}
This code does not secretly mutate the object's internal state. The input state and the output LibraryState are explicit. That said, with large data sets you need to think about the cost of shallow versus deep copying and whether to use a structural-sharing library. This is where persistent data structures come into the picture.
DOP Principle 4: Separate Schema from Data Representation
Because DOP represents data as generic structures, it does not tie the shape of the data to a class definition. Instead, the schema is kept separately.8
const addBookRequestSchema = {
type: "object",
required: ["title", "author"],
properties: {
title: { type: "string" },
author: { type: "string" },
tags: {
type: "array",
items: { type: "string" },
},
},
};
This approach integrates well with tools like JSON Schema.9
๋ฐ์ดํฐ:
{ "title": "DOP", "author": "Sharvit" }
schema:
title์ ํ์ string
author๋ ํ์ string
tags๋ ์ ํ array<string>
Benefits:
Validation of external request/response data becomes clearer.
The schema can be reused for runtime validation, documentation, and test data generation.
During the exploration phase, you can attach the schema late, then tighten it once things stabilize.
Costs:
The connection between data and schema is looser than with classes.
If schema validation is skipped, runtime errors are caught late.
Using static types alongside runtime schema leads to a duplicate management problem.
In TypeScript, you typically have the following options.
1. TypeScript types only
This works well at compile time, but runtime validation of external inputs is weak.
2. Runtime schema using JSON Schema/Zod/io-ts
This handles external input validation well, but introduces schema management overhead.
3. Synchronizing types and schemas with a code generation tool
This is the most robust approach, but it makes the build pipeline more complex.Personal note: DOP's "schema and data separation" is an ideal. Data is free, schemas are optional, validation is explicit, and developers enjoy philosophical peace.
Then you sit in your room drinking coffee and realize the deadline is three days away. The moment that sinks in,
anygets plastered everywhere and DOP becomes DROP.Schema separation is a fine idea, but it falls apart under a deadline.
How Does DOP Handle Polymorphism?
In OOP, polymorphism is expressed primarily through class/interface hierarchies and method dispatch.
interface Shape {
area(): number;
}
In DOP, data carries a kind or type field, and ordinary functions branch on it or use a dispatch table.
type Circle = {
kind: "circle";
radius: number;
};
type Rectangle = {
kind: "rectangle";
width: number;
height: number;
};
type Shape = Circle | Rectangle;
function area(shape: Shape): number {
switch (shape.kind) {
case "circle":
return Math.PI * shape.radius * shape.radius;
case "rectangle":
return shape.width * shape.height;
}
}
This approach resembles Algebraic Data Types (ADTs). TypeScript's discriminated unions, Rust's enums, F#'s discriminated unions, and Haskell's ADTs all support this pattern strongly.
From a DOP perspective, polymorphism is achievable without objects. The Manning book description also lists "polymorphism without objects" as one of the learning topics.[3]
Personal note: In school we were taught that
switchstatements are a bad habit to avoid, and that the elegant solution is inheritance and overriding. But after debugging a few of those so-called "elegant Java enterprise projects" twisted through dozens of layers of inheritance, you find yourself ready to part with OOP-style inheritance forever. What I find oddly comforting is that since I have little to inherit from my parents in real life either, I feel absolutely no attachment to "inheritance" in code.
Where DOP Fits Best
Sharvit-style DOP is particularly well-suited to information systems, that is, systems where moving and transforming data matters more than CPU cache optimization.
REST/GraphQL API
JSON request/response handling
Frontend application state
ETL / data pipeline
event enrichment
workflow state
Configuration file/policy file processing
Audit log/audit trail
In these systems, data already flows in the form of JSON, maps, records, and tables. Rather than forcing it into a deep class hierarchy, it is often simpler to define the data shapes and transformation functions explicitly.
Where DOP Gets Dangerous
DOP reduces the problems of OOP, but it introduces its own.
Field names scattered as string keys make the code vulnerable to typos.
It can become difficult to track which function expects which data shape.
Without schema validation, the result is a "loose bag of maps."
Failing to understand the cost of immutable updates leads to performance problems.
The party responsible for enforcing domain invariants can disappear entirely.
Practical DOP therefore usually needs to be used alongside the following mechanisms.
Schemas like TypeScript types, JSON Schema, or Zod
pure function-centric testing
single path for state mutations
domain events or command handlers
diff, snapshot, audit log
Personal note: Misuse DOP and you escape the "complex maze of object-oriented code" only to fall into the "JSON swamp where anyone can touch anything."
Even when you gain freedom, you eventually need new constraints. Too much freedom is never a good thing.
Differences from OOP
OOP:
Data and behavior are kept together inside objects.
Methods protect the internal invariants of objects.
Objects collaborate through message/method calls.
Data-Oriented:
Data and behavior are separated.
The shape and flow of data are considered first.
Data of the same shape is processed in batch.
OOP is not inherently bad. For things like external resources, drivers, file handles, and network connections, where lifetime and invariants need to be managed strictly, objects are a natural fit. Richard Fabian also notes that OOP can be the better choice for large, stable abstractions like file-system handles or graphics APIs.10
The problems arise when everything is started as an object.
Objects hold too much data.
Inheritance hierarchies obscure data flow.
Memory is laid out by conceptual unit rather than by unit of work.
Cache locality suffers when the same operation is performed across many objects.
Data-oriented thinking critiques exactly this point.
Relationship to Functional Programming
Data-Oriented Programming overlaps considerably with functional programming.
Commonalities:
- Data is not mutated directly.
- Transformation functions are central.
- Inputs and outputs are made explicit.
- Reducing state mutation makes testing easier.
Differences:
- Functional programming places greater emphasis on purity, composition, types, and side-effect control.
- Data-Oriented Programming places greater emphasis on the shape, volume, flow, layout, and access patterns of data.
Simply put:
Functional: Is this computation pure?
Data-Oriented: What shape does this data take as it flows?
Performance DOD: How is this data read from memory?
Advantages
1. Bulk data processing performance can improve.
2. Data flow becomes clearer.
3. Unnecessary object graphs can be reduced.
4. Batch processing and parallelization become easier.
5. Serialization, logging, replay, and testing become simpler.
6. It becomes easier to separate business data from transformation logic.
It is especially powerful in domains like games, simulation, rendering, physics, data pipelines, and event processing, where the same operation is applied repeatedly to large volumes of data.
Disadvantages
1. The design looks less like "real-world nouns."
2. It can be over-engineering for small programs.
3. Separating data from logic can make it harder to trace behavior.
4. Used carelessly, it devolves into global data tables and blobs of procedural code.
5. The Immutable Data approach requires careful attention to allocation and copying costs.
6. Performance-focused DOD requires a solid understanding of CPU, cache, and memory models.
Personal note: Right after learning about data-oriented design, you will want to tear every class apart into arrays. It is better to resist that urge for a moment.
Not every program is a game engine, and not every object is a criminal.
When to Use It
You process large volumes of similar data.
You repeatedly perform the same operations.
Performance bottlenecks arise from memory access.
The object graph is too complex to track state effectively.
Serialization, persistence, transmission, and replay matter.
You want to compare input data with output data in tests.
When to Be Careful
The data volume is small.
There are no performance bottlenecks.
Objects must strictly enforce domain invariants.
Managing the lifetime of external resources is the core concern.
The team is unfamiliar with memory layout or batch processing.
It is a business application where simple CRUD matters more than abstraction.
This does not mean data-oriented thinking is useless in business applications. Applying game-engine-style DOD directly, however, is overkill. In business applications, Sharvit-style DOP, separating data from transformations and reducing mutation, is often far more practical.
Anti-Patterns
1. Tearing Everything into Arrays
Data-oriented design does not mean always using SoA. If a job unit frequently needs entire objects, AoS is the better choice.
Question:
What fields does this operation actually read?
How often are those fields read together?
Restructuring without asking that question just produces code that is hard to read.
2. Failing to Separate Data from Invariants
Separating data from logic is good, but scattering invariants is dangerous.
Bad example:
Multiple systems modify `health` arbitrarily.
It is impossible to tell where it drops to 0 or below.
Death handling, UI updates, and event dispatch fall out of sync with each other.
Even in data-oriented design, ownership of invariants is necessary.
Good example:
`DamageSystem` is solely responsible for reducing health.
`DeathSystem` handles the `health <= 0` state.
The `HealthChanged` event is published explicitly.
3. Optimizing Without Measuring
Data-Oriented Design emphasizes actual data and access patterns. Changing structure based solely on a hunch that it will be "cache-friendly," without any measurement, turns design into a game.
Measure first:
- Number of data items
- Access frequency
- hot path
- cache miss
- allocation
- branch miss
- frame time / latency
4. Rejecting OOP Unconditionally
OOP is strong at expressing invariants and lifetime management. Data-oriented design is strong at bulk processing and data flow.
Where objects shine:
File handles, DB connections, transactions, UI widgets, external API clients
Where data-oriented approaches shine:
particle, transform, physics body, telemetry event, order rows, batch job
There is no need to use only one; use both together.
Just as our parents told us to get along well with our friends.
5. Overusing Sharvit-Style DOP (Immutability) in Performance-Critical DOD Contexts
Just because they both carry the data-oriented label does not mean DOD and DOP are perfectly interchangeable.
Sharvit-style DOP's core practice of immutable data necessarily involves spread operators (...spread) or allocating new objects. If you produce a fresh immutable object every time you update state inside a game engine's hot loop that runs tens of thousands of times per frame (the territory of DOD), the garbage collector will scream and halt the runtime before you ever get around to optimizing the CPU cache.
Information systems (DOP): willingly pay the overhead of immutability and GC in order to tame the complexity of state.
High-performance loops (DOD): to block GC involvement at the source, data is overwritten in-place without mercy within pre-allocated contiguous memory (arrays), mutating data directly where it lives (In-place Mutation).If you blindly mix the two paradigms without clearly distinguishing their target bottlenecks (cognitive load vs. hardware limits), you end up with a horrifying chimera that is neither maintainable nor performant.
Practical Checklist
When approaching a problem with a data-oriented lens, start by writing down the following.
[What]
What data does this system transform?
[Why]
Why should you look at data flow before the object model?
[Shape]
What is the shape of the data? Is it rows, a tree, a graph, or an event stream?
[Volume]
How many data items are there? Ten? A hundred thousand? Ten million?
[Frequency]
How often is it read and written?
[Hot Path]
What is the most frequently executed loop?
[Invariant]
What are the rules that must never be broken?
[Layout]
Is data that is read together kept together?
Is data that is not read together kept separate?
[Next]
If the measurements change, which structures will you change?
Personal note: The biggest problem with checklists is that you never actually use them. On the next project you always forget and start over. That is the destiny of a checklist.
Final Summary
Data-Oriented Programming is not a movement against objects.
Personal note: Of course, for some people it absolutely is a movement against objects.
The core idea is this.
Place the program's focus on real data and its transformation flow, rather than abstract objects.In performance-focused DOD, the following matter most.
Data Volume
Access Frequency
Memory Layout
cache locality
batch processing
hot/cold split
In complexity-focused DOP, the following matter most.
Code/Data Separation
generic data structure
immutable data
non-mutating function
schema
A one-sentence definition of data-oriented programming can be written like this.
Data-Oriented Programming is
a programming approach that, rather than asking "what exists,"
first asks "what data is being transformed, in what form, and how frequently."Personal note: That said, the kind of engineer I aspire to be is not a zealot of any particular paradigm.
In the face of the cold reality of business requirements and runtime constraints (available memory, CPU cache, deadline), I should be a mercenary willing to bend principles in order to find the best compromise.
See Also
Persistent Data Structures
Data Locality
ECS
JSON Schema
Algebraic Data Types (ADTs)
Footnotes
- Mike Acton. Data-Oriented Design and C++. CppCon 2014 presentation. This landmark talk popularized Data-Oriented Design (DOD) discussions in the C++ and game engine communities. โฉ
- Unity. Introduction to the Data-Oriented Technology Stack. Unity DOTS is a representative example of applying Data-Oriented Design centered on the Entity Component System (ECS). โฉ
- Yehonathan Sharvit. *Data-Oriented Programming*. Manning, 2022. Manning's introduction explains Data-Oriented Programming (DOP) through immutable generic data structures and non-mutating general-purpose functions. โฉ
- Yehonathan Sharvit. "Principles of Data-Oriented Programming". An article outlining the four principles of Data-Oriented Programming: separation of code from data, generic data structures, immutable data, and separation of schema from representation. โฉ
- Yehonathan Sharvit. "Separate code from data". Data-Oriented Programming principle 1. Explains the benefits and costs of separating code from data. โฉ
- Yehonathan Sharvit. "Represent data with generic data structures". Data-Oriented Programming principle 2. Explains generic data structures like maps and arrays, and their trade-offs. โฉ
- Yehonathan Sharvit. "Data is immutable". Data-Oriented Programming principle 3. Explains the practice of creating new versions of data instead of mutating existing data. โฉ
- Yehonathan Sharvit. "Separate data schema from data representation". Data-Oriented Programming principle 4. Explains why it is important to separate data representation from schema. โฉ
- JSON Schema. Official website. A vocabulary for expressing the shape and validation rules of JSON data. โฉ
- Richard Fabian. *Data-Oriented Design*. Fabian explains that Data-Oriented Design is not merely a matter of cache misses but rather an approach that considers the type, frequency, quantity, shape, and probability of data. โฉ