Skip to content
Code Card

SoA Data Layout

Structure of Arrays (SoA) is a data layout pattern that separates fields of the same kind into individual contiguous arrays rather than bundling all fields together in a single object. Where AoS (Array of Structures) creates an "array of particle objects" like `Particle[]`, SoA creates a "bundle of per-attribute arrays" like `x[]`, `y[]`, `vx[]`, `vy[]`.

Practice memo Β· 24 min read Β· Hard

SoA (Structure of Arrays) is a data layout pattern that separates fields of the same kind into individual contiguous arrays rather than bundling all fields into a single object. Where AoS (Array of Structures) creates an "array of particle objects" like Particle[], SoA creates a "bundle of per-field arrays" like x[], y[], vx[], vy[].

The intent of this pattern is to separate the domain model from the execution model. The domain model should be easy for humans to read, but the execution model needs to be something the CPU can scan quickly. In an object-oriented model, having x, y, vx, vy, color, debugName, and renderHandle all packed into a single Particle object is convenient to read, but a position-update pass actually needs only x, y, vx, and vy.

SoA can be fast not merely because it "reduces cache misses." When fields of the same kind are gathered into contiguous arrays, it becomes easier for the CPU and compiler to apply SIMD, that is, vectorization that processes multiple data elements at once. In AoS, data is interleaved as x1, y1, vx1, vy1, x2, y2, vx2, vy2. In SoA, by contrast, x1, x2, x3, x4 are contiguous. This difference affects not only cache locality but also instruction throughput.

While SoA is a powerful technique, if you frequently edit individual objects, work with small data sets, have a bottleneck in I/O, or read nearly all fields of an object in a single pass, traditional AoS can be simpler and fast enough. Many people interpret SoA as "abandon object-oriented design," but more precisely it means "on the hot path, lay out data in the order that each pass actually reads the fields."

Core formula:

Text
SoA = 같은 ν•„λ“œλΌλ¦¬ 연속 λ°°μ—΄λ‘œ 배치
Cache Locality = ν•„μš”ν•œ λ°μ΄ν„°λ§Œ μΊμ‹œμ— 올리기
SIMD Friendliness = 같은 연산을 연속 데이터에 μ μš©ν•˜κΈ° μ‰½κ²Œ λ§Œλ“€κΈ°
Dense Pass = branch 없이 0..count κ΅¬κ°„λ§Œ μ„ ν˜• 순회
Swap-remove = μˆœμ„œκ°€ μ€‘μš”ν•˜μ§€ μ•Šμ„ λ•Œ μ‚­μ œ λΉ„μš©μ„ O(1)에 κ°€κΉκ²Œ λ§Œλ“€κΈ°

1. When should you use it?

Suppose you update the positions of 100,000 particles every frame.

Text
x = x + vx * dt
y = y + vy * dt

This operation does not need the particle's name, color, rendering handle, or debug tag. All it needs is x, y, vx, and vy.

A typical AoS looks like this.

Text
particles = [
  { x, y, vx, vy, color, debugName, renderHandle },
  { x, y, vx, vy, color, debugName, renderHandle },
  ...
]

A typical SoA looks like this.

Text
positionsX[]
positionsY[]
velocitiesX[]
velocitiesY[]

In SoA, the position-update pass scans only the required arrays linearly.

Text
positionsX[i] += velocitiesX[i] * dt
positionsY[i] += velocitiesY[i] * dt

Shifting focus from "what an object is" to "what a pass reads" goes beyond simply learning a memory optimization technique; it is a philosophical turning point that reformats a programmer's mental model from 'Human-centric' to 'Hardware-centric'.

Object-oriented programming (OOP) views the world through 'nouns': it groups the world into conceptual units such as 'Particle', 'Player', and 'Enemy'. Because of this, bundling those concepts inside a single class or struct in memory feels natural, and this is typically referred to as having semantic cohesion. But the CPU doesn't think that way. It just runs.

  • AoS (object-centric): When the position-update pass reads particle1.x, the cache line pulls in garbage data from particle1 (color, rendering handle, etc.) along with it. In pointer-based or object-array layouts, the next particle's data is scattered across the heap, making it hard for the hardware prefetcher to follow; even in a contiguous struct array, the high ratio of cold fields relative to hot fields wastes cache bandwidth.

  • SoA (pass-centric): When a pass starts reading only the x[] and vx[] arrays sequentially (linearly), the hardware prefetcher works perfectly. While the CPU performs its computation, the next dozens of data elements are already being delivered to L1 cache in the background. Memory latency effectively drops to near zero.

Note that even within AoS, you need to distinguish two cases. A contiguous AoS where the objects themselves are stored inline, as in a C/Rust/C# struct array, differs from a reference-following AoS where you chase pointers, as in a JavaScript object array or C# class array. The former has a regular stride so the prefetcher can keep up to some extent, but the problem of hot and cold fields sharing the same cache line remains. The latter tends to scatter objects across the heap, which puts it at a disadvantage on both locality and prefetching. In practice, though, you rarely need to know the layout to that level of detail. Knowing when to compromise is also a skill. Most people who set out to learn SoA do so through game-oriented examples, such as Rust baby engines or Unity's ECS, so the examples here are also written with games in mind.

2. Core expressions

The common example is a structure that stores particle positions and velocities in SoA and performs a single-frame position update.


C#

C#
using System;
using System.Numerics;

public sealed class ParticleSoa
{
    private readonly float[] positionsX;
    private readonly float[] positionsY;
    private readonly float[] velocitiesX;
    private readonly float[] velocitiesY;
    private int count;

    public ParticleSoa(int capacity)
    {
        if (capacity <= 0)
        {
            throw new ArgumentOutOfRangeException(nameof(capacity));
        }

        this.positionsX = new float[capacity];
        this.positionsY = new float[capacity];
        this.velocitiesX = new float[capacity];
        this.velocitiesY = new float[capacity];
        this.count = 0;
    }

    public int GetCount()
    {
        return this.count;
    }

    public int Add(float x, float y, float vx, float vy)
    {
        if (this.count >= this.positionsX.Length)
        {
            throw new InvalidOperationException("Insufficient particle storage capacity.");
        }

        int index = this.count;
        this.positionsX[index] = x;
        this.positionsY[index] = y;
        this.velocitiesX[index] = vx;
        this.velocitiesY[index] = vy;
        this.count += 1;

        return index;
    }

    public void Update(float deltaTime)
    {
        Span<float> x = this.positionsX.AsSpan(0, this.count);
        Span<float> y = this.positionsY.AsSpan(0, this.count);
        ReadOnlySpan<float> vx = this.velocitiesX.AsSpan(0, this.count);
        ReadOnlySpan<float> vy = this.velocitiesY.AsSpan(0, this.count);

        int width = Vector<float>.Count;
        Vector<float> dt = new(deltaTime);

        int index = 0;

        for (; index <= this.count - width; index += width)
        {
            Vector<float> xVector = new(x.Slice(index, width));
            Vector<float> yVector = new(y.Slice(index, width));
            Vector<float> vxVector = new(vx.Slice(index, width));
            Vector<float> vyVector = new(vy.Slice(index, width));

            (xVector + vxVector * dt).CopyTo(x.Slice(index, width));
            (yVector + vyVector * dt).CopyTo(y.Slice(index, width));
        }

        for (; index < this.count; index += 1)
        {
            x[index] += vx[index] * deltaTime;
            y[index] += vy[index] * deltaTime;
        }
    }

    public (float X, float Y) GetPosition(int index)
    {
        this.EnsureValidIndex(index);

        return (this.positionsX[index], this.positionsY[index]);
    }

    private void EnsureValidIndex(int index)
    {
        if ((uint)index >= (uint)this.count)
        {
            throw new ArgumentOutOfRangeException(nameof(index));
        }
    }
}

In C#, you can make SIMD intent explicit with Vector<float>. Note that the actual vector width varies depending on the runtime and CPU. For numerical hot paths, consider System.Numerics, Span<T>, ArrayPool<T>, or lower-level intrinsics.


TypeScript

TypeScript
export class ParticleSoa
{
  readonly #positionsX: Float32Array;
  readonly #positionsY: Float32Array;
  readonly #velocitiesX: Float32Array;
  readonly #velocitiesY: Float32Array;
  #count: number;

  public constructor(capacity: number)
  {
    if (!Number.isInteger(capacity) || capacity <= 0) {
      throw new RangeError("`capacity` must be a positive integer (β‰₯ 1).");
    }

    this.#positionsX = new Float32Array(capacity);
    this.#positionsY = new Float32Array(capacity);
    this.#velocitiesX = new Float32Array(capacity);
    this.#velocitiesY = new Float32Array(capacity);
    this.#count = 0;
  }

  public getCount(): number
  {
    return this.#count;
  }

  public add(x: number, y: number, vx: number, vy: number): number
  {
    if (this.#count >= this.#positionsX.length) {
      throw new RangeError("Insufficient particle storage capacity.");
    }

    const index = this.#count;
    this.#positionsX[index] = x;
    this.#positionsY[index] = y;
    this.#velocitiesX[index] = vx;
    this.#velocitiesY[index] = vy;
    this.#count += 1;

    return index;
  }

  public update(deltaTime: number): void
  {
    for (let index = 0; index < this.#count; index += 1) {
      this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
      this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
    }
  }

  public getPosition(index: number): readonly [number, number]
  {
    this.#ensureValidIndex(index);

    return [this.#positionsX[index], this.#positionsY[index]];
  }

  #ensureValidIndex(index: number): void
  {
    if (!Number.isInteger(index) || index < 0 || index >= this.#count) {
      throw new RangeError("`index` is not valid.");
    }
  }
}

In TypeScript, Float32Array expresses SoA intent clearly. Because the optimizations a JavaScript engine applies internally vary by runtime, the primary goal is to establish a predictable layout using typed arrays rather than expecting direct SIMD.

Python

Python
from array import array


class ParticleSoa:
    def __init__(self, capacity: int) -> None:
        if capacity <= 0:
            raise ValueError("capacity must be at least 1.")

        self._positions_x = array("f", [0.0]) * capacity
        self._positions_y = array("f", [0.0]) * capacity
        self._velocities_x = array("f", [0.0]) * capacity
        self._velocities_y = array("f", [0.0]) * capacity
        self._count = 0

    def get_count(self) -> int:
        return self._count

    def add(self, x: float, y: float, vx: float, vy: float) -> int:
        if self._count >= len(self._positions_x):
            raise OverflowError("The particle storage capacity is insufficient.")

        index = self._count
        self._positions_x[index] = x
        self._positions_y[index] = y
        self._velocities_x[index] = vx
        self._velocities_y[index] = vy
        self._count += 1

        return index

    def update(self, delta_time: float) -> None:
        for index in range(self._count):
            self._positions_x[index] += self._velocities_x[index] * delta_time
            self._positions_y[index] += self._velocities_y[index] * delta_time

    def get_position(self, index: int) -> tuple[float, float]:
        self._ensure_valid_index(index)

        return (self._positions_x[index], self._positions_y[index])

    def _ensure_valid_index(self, index: int) -> None:
        if index < 0 or index >= self._count:
            raise IndexError("index is invalid.")

Using only the Python standard library, you can create a SoA shape with array. However, for genuinely fast large-scale numerical computation, you would typically consider tools such as NumPy, Numba, Cython, or Rust/C extensions. Even with a better data layout, pure Python loops can be bottlenecked by interpreter overhead.

Rust

Rust
#[derive(Debug)]
pub struct ParticleSoa {
    positions_x: Vec<f32>,
    positions_y: Vec<f32>,
    velocities_x: Vec<f32>,
    velocities_y: Vec<f32>,
    count: usize,
}

impl ParticleSoa {
    pub fn new(capacity: usize) -> Self {
        assert!(capacity > 0, "capacity must be 1 or greater.");

        Self {
            positions_x: vec![0.0; capacity],
            positions_y: vec![0.0; capacity],
            velocities_x: vec![0.0; capacity],
            velocities_y: vec![0.0; capacity],
            count: 0,
        }
    }

    pub fn get_count(&self) -> usize {
        self.count
    }

    pub fn add(&mut self, x: f32, y: f32, vx: f32, vy: f32) -> usize {
        assert!(
            self.count < self.positions_x.len(),
            "Particle storage capacity is insufficient."
        );

        let index = self.count;
        self.positions_x[index] = x;
        self.positions_y[index] = y;
        self.velocities_x[index] = vx;
        self.velocities_y[index] = vy;
        self.count += 1;

        index
    }

    pub fn update(&mut self, delta_time: f32) {
        let positions_x = &mut self.positions_x[..self.count];
        let positions_y = &mut self.positions_y[..self.count];
        let velocities_x = &self.velocities_x[..self.count];
        let velocities_y = &self.velocities_y[..self.count];

        for (((x, y), vx), vy) in positions_x
            .iter_mut()
            .zip(positions_y.iter_mut())
            .zip(velocities_x.iter())
            .zip(velocities_y.iter())
        {
            *x += *vx * delta_time;
            *y += *vy * delta_time;
        }
    }

    pub fn get_position(&self, index: usize) -> Option<(f32, f32)> {
        if index >= self.count {
            return None;
        }

        Some((self.positions_x[index], self.positions_y[index]))
    }
}

In Rust, directly accessing multiple arrays by the same index is not always bad; LLVM may eliminate the bounds checks. However, in hot loops, using iterators with zip makes it easier for the compiler to understand the length relationships and increases the likelihood of bounds-check elimination and vectorization. This difference can matter especially in loops that simultaneously access multiple slices by the same index.


3. Call site

The call site does not touch the internal arrays of an SoA directly (from the perspective of separation of concerns and encapsulation information hiding). Responsibilities for creation policy, data addition, update pass, and result querying are kept separate.

TypeScript
type SpawnParticle = Readonly<{
  x: number;
  y: number;
  vx: number;
  vy: number;
}>;

type FramePolicy = Readonly<{
  deltaTime: number;
  maxParticles: number;
}>;

function createParticleStore(
  particles: readonly SpawnParticle[],
  policy: FramePolicy,
): ParticleSoa
{
  if (particles.length > policy.maxParticles) {
    throw new RangeError("The initial particle count exceeded `maxParticles`.");
  }

  const store = new ParticleSoa(policy.maxParticles);

  for (const particle of particles) {
    store.add(particle.x, particle.y, particle.vx, particle.vy);
  }

  return store;
}

function simulateOneFrame(
  store: ParticleSoa,
  policy: FramePolicy,
): void
{
  store.update(policy.deltaTime);
}

const particles = createParticleStore(
  [
    { x: 0, y: 0, vx: 10, vy: 0 },
    { x: 5, y: 5, vx: 0, vy: -2 },
  ],
  { deltaTime: 0.016, maxParticles: 1024 },
);

simulateOneFrame(particles, { deltaTime: 0.016, maxParticles: 1024 });

console.log(particles.getPosition(0));

Separation of responsibilities:

Text
SpawnParticle        = External input DTO
FramePolicy          = Per-frame execution policy
createParticleStore  = Input validation and SoA initialization
ParticleSoa          = Memory layout and update pass ownership
simulateOneFrame     = Frame flow assembly

4. Reading order

Text
What fields does this pass actually read?
β†’ Are the same fields grouped together in contiguous arrays?
β†’ Does the loop traverse the 0..count range linearly?
β†’ Are there branches remaining inside the hot loop?
β†’ Is the shape amenable to SIMD or auto-vectorization?
β†’ Does the external interface expose only safe methods instead of raw arrays?
β†’ Do layout changes leak out to call sites?

SoA code should be read not around a single object, but around which arrays a single pass consumes.


5. Boundaries and misconceptions

SoA is a layout pattern suited to bulk iterative computation. If you frequently edit individual particles, if domain logic strongly requires per-object invariants, or if the data count is small, AoS is worth considering. SoA is not a pattern that sacrifices encapsulation for performance; it simply practices a different kind of encapsulation. The baseline is to change only the internal layout while keeping the external API safe.

The benefits of SoA should be understood from two angles. First, reading only the required fields contiguously can improve cache locality. Second, applying the same operation to contiguous arrays of the same type makes SIMD and auto-vectorization easier. Leaving out the second point means only half the story has been told.

Auto-vectorization here is not guaranteed. Branches, aliasing, bounds checks, function calls, unclear length relationships, the possibility of exceptions, and runtime type uncertainty can all prevent vectorization. Therefore, SoA is a "shape that makes vectorization possible," not a spell that guarantees SIMD across every compiler and runtime.

Domain-level predictable cases and system-level concerns must also be separated. Exceeding capacity is a domain/policy issue, while a large typed-array allocation failure, out-of-memory condition, or runtime abort is an infrastructure problem. Because SoA code typically lives on the hot path, it is better to finish validation at construction time and at boundaries rather than inserting expensive exception handling or dynamic type checks in every iteration.

Production failure cases:

Text
SoA internal arrays drift to different lengths, breaking the index invariant
Exposing raw arrays externally allows arbitrary modifications
During add/remove, only some arrays get updated, causing data misalignment
Even when most entries have `active=false`, the full capacity is iterated every frame
Too many branches inside the hot loop interfere with branch prediction and vectorization
In Rust, accessing multiple arrays by index may prevent bounds check elimination
Calling `getParticleObject()` in bulk inside a loop for debugging convenience
Replacing all data structures with SoA without any performance measurement

6. Bad examples

TypeScript
type Particle = {
  x: number;
  y: number;
  vx: number;
  vy: number;
  color: string;
  debugName: string;
  renderHandle: object;
  active: boolean;
};

function updateParticlesBad(
  particles: Particle[],
  deltaTime: number,
): void
{
  for (const particle of particles) {
    if (!particle.active) {
      continue;
    }

    particle.x += particle.vx * deltaTime;
    particle.y += particle.vy * deltaTime;
  }
}

Why it's bad:

Text
- `color`, `debugName`, and `renderHandle`, which the update pass never writes, are mixed into the same object.
- The loop only needs `x`/`y`/`vx`/`vy`, yet it accesses data object by object.
- With large datasets, the cost of per-object access and reference tracking can grow significantly.
- The active branch remains inside the hot loop.
- Because the data is interleaved, the layout is not SIMD-friendly for processing only `x` values in a contiguous sequence.
- Data with different lifetimes, such as render handles and debug names, are physically bundled together.

As with any code, the right answer depends on the scale of the program. The code above is perfectly good when the particle count is small and individual edits are frequent. The real problem arises on a specific hot path when data that is never read or written in that path (data with a different lifetime and purpose) is physically bundled together with the main computation data, pointlessly burning CPU cache bandwidth.


7. Production scaling

7.1 Active index pattern

When many entries have active=false, scanning the entire array every time becomes wasteful. In that case, you add an active index array to the SoA and iterate only over the indices that actually need updating.

TypeScript
export class ActiveIndexParticleSoa
{
  readonly #positionsX: Float32Array;
  readonly #positionsY: Float32Array;
  readonly #velocitiesX: Float32Array;
  readonly #velocitiesY: Float32Array;
  readonly #activeIndexes: Uint32Array;
  #count: number;
  #activeCount: number;

  public constructor(capacity: number)
  {
    if (!Number.isInteger(capacity) || capacity <= 0) {
      throw new RangeError("capacity must be an integer greater than or equal to 1.");
    }

    this.#positionsX = new Float32Array(capacity);
    this.#positionsY = new Float32Array(capacity);
    this.#velocitiesX = new Float32Array(capacity);
    this.#velocitiesY = new Float32Array(capacity);
    this.#activeIndexes = new Uint32Array(capacity);
    this.#count = 0;
    this.#activeCount = 0;
  }

  public add(x: number, y: number, vx: number, vy: number): number
  {
    if (this.#count >= this.#positionsX.length) {
      throw new RangeError("Insufficient particle storage capacity.");
    }

    const index = this.#count;
    this.#positionsX[index] = x;
    this.#positionsY[index] = y;
    this.#velocitiesX[index] = vx;
    this.#velocitiesY[index] = vy;
    this.#activeIndexes[this.#activeCount] = index;
    this.#count += 1;
    this.#activeCount += 1;

    return index;
  }

  public updateActive(deltaTime: number): void
  {
    for (let cursor = 0; cursor < this.#activeCount; cursor += 1) {
      const index = this.#activeIndexes[cursor];

      this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
      this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
    }
  }

  public getActiveCount(): number
  {
    return this.#activeCount;
  }
}

This approach is efficient for skipping dead particles, but it still involves indirect access through activeIndexes. It can be advantageous when the active ratio is low, but when the active ratio is high it may actually be slower than a simple dense array scan.


7.2 When order doesn't matter: swap-remove dense array

When particle order does not matter in a particle system, a stronger optimization is possible. Instead of marking dead particles as inactive, move the last live element in the array to the deleted position and then decrement count. This keeps the 0..count range always fully packed with live particles.

TypeScript
export class DenseParticleSoa
{
  readonly #positionsX: Float32Array;
  readonly #positionsY: Float32Array;
  readonly #velocitiesX: Float32Array;
  readonly #velocitiesY: Float32Array;
  #count: number;

  public constructor(capacity: number)
  {
    if (!Number.isInteger(capacity) || capacity <= 0) {
      throw new RangeError("`capacity` must be a positive integer greater than or equal to 1.");
    }

    this.#positionsX = new Float32Array(capacity);
    this.#positionsY = new Float32Array(capacity);
    this.#velocitiesX = new Float32Array(capacity);
    this.#velocitiesY = new Float32Array(capacity);
    this.#count = 0;
  }

  public add(x: number, y: number, vx: number, vy: number): number
  {
    if (this.#count >= this.#positionsX.length) {
      throw new RangeError("Insufficient capacity in the particle storage.");
    }

    const index = this.#count;
    this.#positionsX[index] = x;
    this.#positionsY[index] = y;
    this.#velocitiesX[index] = vx;
    this.#velocitiesY[index] = vy;
    this.#count += 1;

    return index;
  }

  public removeSwap(index: number): void
  {
    this.#ensureValidIndex(index);

    const last = this.#count - 1;

    this.#positionsX[index] = this.#positionsX[last];
    this.#positionsY[index] = this.#positionsY[last];
    this.#velocitiesX[index] = this.#velocitiesX[last];
    this.#velocitiesY[index] = this.#velocitiesY[last];

    this.#count -= 1;
  }

  public update(deltaTime: number): void
  {
    for (let index = 0; index < this.#count; index += 1) {
      this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
      this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
    }
  }

  public getCount(): number
  {
    return this.#count;
  }

  #ensureValidIndex(index: number): void
  {
    if (!Number.isInteger(index) || index < 0 || index >= this.#count) {
      throw new RangeError("`index` is invalid.");
    }
  }
}

The advantages of this approach are as follows.

Text
- No `isActive` array needed.
- No indirect access through `activeIndexes` needed.
- The `0..count` range is always dense.
- No active branch inside the update loop.
- The layout becomes more amenable to SIMD and branch prediction.

There are trade-offs as well.

Text
The order changes on deletion.
If external code holds an index as a handle, it can become invalid.
If there is an `entityId` β†’ `index` map, the map must also be updated when swapping.
This approach may be unsuitable for UI lists, replay logs, or deterministic serialization where stable ordering is required.

Operational guidelines are typically as follows.

Text
Order doesn't matter β†’ swap-remove dense array
Order matters β†’ stable compaction or activeIndexes
Deletions are rare β†’ a simple isActive branch may suffice
Deletions are frequent and active entries are few β†’ consider activeIndexes or dense compaction

Production metric setup:

Text
particle.count
particle.capacity
particle.update.duration_ms
particle.items_per_ms
particle.remove.count
particle.swap_remove.count
particle.capacity_utilization

SoA should not be applied by gut feeling. Programming by intuition is comfortable, but measuring as you go is the right approach. Voluntarily adding complexity only to struggle with maintenance later is ultimately a cost you pay yourself. You need to compare AoS, SoA, active index, and swap-remove under the same input size, the same active ratio, and the same runtime conditions.


8. Comparison notes: C# / TypeScript / Python / Rust

Language

Idiomatic expression

Caution

C#

float[], Span<T>, Vector<T>, ArrayPool<T>

Do not expose internal arrays directly to the outside

TypeScript

Float32Array, Uint8Array, Uint32Array

Do not assume SIMD is automatically guaranteed

Python

array, memoryview, NumPy

Pure Python loops carry significant interpreter overhead

Rust

Vec<f32>, slice, iterator zip

In hot loops, check whether multi-index access on multiple arrays may fail to eliminate bounds checks

In Rust, pay particular attention to code that indexes multiple arrays simultaneously with for index in 0..count. While safe, this pattern can leave bounds checks in place on hot loops or interfere with vectorization. Slicing each array to [..count] to align their lengths and then iterating with iter_mut().zip(...) makes it easier for the compiler to understand the length relationships.

In C#, you can make SIMD intent explicit with Vector<T>, but actual performance depends on the CPU, JIT, data alignment, and loop shape. In TypeScript, the key is stabilizing the memory layout with Float32Array. In Python, if you truly need numerical performance, moving to a vectorization library like NumPy is generally the more pragmatic choice.

Learning a language in programming ultimately means internalizing that language's philosophy, as Alan Perlis put it. If you mimic a Rust-style ownership model in TypeScript, expect C-level performance from pure Python loops, or wrap everything in objects in C# and call it SoA, that is just schematic pattern-matching, so I'd encourage you to go back and study again.

That said, framing it this way makes it sound harder than it is. The only thing you really need to understand is that the domain model (Object) and the execution model (Pass) should be distinct. The Gather/Scatter pattern is commonly used to bridge these two models, but I won't cover it here. I'll write about it when I have time.

Personal note: In the end, the essence of SoA is severing the forced union of 'data for machines (execution model)' and 'data for humans (domain model)'. The impulse to cram everything into a single Object is an act of violence that imposes the structure of the human brain onto the CPU.


9. Further questions to consider

  • Does this pass truly need the entire object, or does it only need a few specific field arrays?

  • Is the benefit of SoA due to cache locality, the possibility of SIMD, or branch elimination?

  • As the active ratio drops, which is most appropriate: isActive[], activeIndexes[], or swap-remove?

  • Is this data order-insensitive, or is stable ordering a domain requirement?

  • Have you actually benchmarked the index-loop version against the iterator zip version in Rust?

  • What API allows adequate testing and debugging without exposing SoA's internal arrays to the outside?

  • Does this optimization actually reduce the real bottleneck, or does it only increase code complexity?

10. Summary

  • SoA stores data in contiguous per-field arrays rather than per-object storage.

  • The benefits of SoA lie not only in cache locality but also in the potential for SIMD and auto-vectorization.

  • In hot loops, you should minimize branches, bounds checks, indirect access, and allocation.

  • In Rust, using slice + iterator zip tends to be more optimization-friendly than accessing multiple arrays by the same index.

  • For particle systems where order does not matter, maintaining a always-dense array with swap-remove is a powerful option.

  • SoA is an internal execution layout; the external API should still be encapsulated.

Quick reference:

Text
Arrange data not by object, but in the order that each pass reads its fields, and where possible, push it through dense loops with no branches.