SoA Data Layout
Structure of Arrays (SoA) is a data layout pattern that separates fields of the same kind into individual contiguous arrays rather than bundling all fields together in a single object. Where AoS (Array of Structures) creates an "array of particle objects" like `Particle[]`, SoA creates a "bundle of per-attribute arrays" like `x[]`, `y[]`, `vx[]`, `vy[]`.
Practice memo Β· 24 min read Β· Hard
SoA (Structure of Arrays) is a data layout pattern that separates fields of the same kind into individual contiguous arrays rather than bundling all fields into a single object. Where AoS (Array of Structures) creates an "array of particle objects" like Particle[], SoA creates a "bundle of per-field arrays" like x[], y[], vx[], vy[].
The intent of this pattern is to separate the domain model from the execution model. The domain model should be easy for humans to read, but the execution model needs to be something the CPU can scan quickly. In an object-oriented model, having x, y, vx, vy, color, debugName, and renderHandle all packed into a single Particle object is convenient to read, but a position-update pass actually needs only x, y, vx, and vy.
SoA can be fast not merely because it "reduces cache misses." When fields of the same kind are gathered into contiguous arrays, it becomes easier for the CPU and compiler to apply SIMD, that is, vectorization that processes multiple data elements at once. In AoS, data is interleaved as x1, y1, vx1, vy1, x2, y2, vx2, vy2. In SoA, by contrast, x1, x2, x3, x4 are contiguous. This difference affects not only cache locality but also instruction throughput.
While SoA is a powerful technique, if you frequently edit individual objects, work with small data sets, have a bottleneck in I/O, or read nearly all fields of an object in a single pass, traditional AoS can be simpler and fast enough. Many people interpret SoA as "abandon object-oriented design," but more precisely it means "on the hot path, lay out data in the order that each pass actually reads the fields."
Core formula:
SoA = κ°μ νλλΌλ¦¬ μ°μ λ°°μ΄λ‘ λ°°μΉ
Cache Locality = νμν λ°μ΄ν°λ§ μΊμμ μ¬λ¦¬κΈ°
SIMD Friendliness = κ°μ μ°μ°μ μ°μ λ°μ΄ν°μ μ μ©νκΈ° μ½κ² λ§λ€κΈ°
Dense Pass = branch μμ΄ 0..count ꡬκ°λ§ μ ν μν
Swap-remove = μμκ° μ€μνμ§ μμ λ μμ λΉμ©μ O(1)μ κ°κΉκ² λ§λ€κΈ°1. When should you use it?
Suppose you update the positions of 100,000 particles every frame.
x = x + vx * dt
y = y + vy * dtThis operation does not need the particle's name, color, rendering handle, or debug tag. All it needs is x, y, vx, and vy.
A typical AoS looks like this.
particles = [
{ x, y, vx, vy, color, debugName, renderHandle },
{ x, y, vx, vy, color, debugName, renderHandle },
...
]A typical SoA looks like this.
positionsX[]
positionsY[]
velocitiesX[]
velocitiesY[]In SoA, the position-update pass scans only the required arrays linearly.
positionsX[i] += velocitiesX[i] * dt
positionsY[i] += velocitiesY[i] * dtShifting focus from "what an object is" to "what a pass reads" goes beyond simply learning a memory optimization technique; it is a philosophical turning point that reformats a programmer's mental model from 'Human-centric' to 'Hardware-centric'.
Object-oriented programming (OOP) views the world through 'nouns': it groups the world into conceptual units such as 'Particle', 'Player', and 'Enemy'. Because of this, bundling those concepts inside a single class or struct in memory feels natural, and this is typically referred to as having semantic cohesion. But the CPU doesn't think that way. It just runs.
AoS (object-centric): When the position-update pass reads
particle1.x, the cache line pulls in garbage data fromparticle1(color, rendering handle, etc.) along with it. In pointer-based or object-array layouts, the next particle's data is scattered across the heap, making it hard for the hardware prefetcher to follow; even in a contiguous struct array, the high ratio of cold fields relative to hot fields wastes cache bandwidth.SoA (pass-centric): When a pass starts reading only the
x[]andvx[]arrays sequentially (linearly), the hardware prefetcher works perfectly. While the CPU performs its computation, the next dozens of data elements are already being delivered to L1 cache in the background. Memory latency effectively drops to near zero.
Note that even within AoS, you need to distinguish two cases. A contiguous AoS where the objects themselves are stored inline, as in a C/Rust/C# struct array, differs from a reference-following AoS where you chase pointers, as in a JavaScript object array or C# class array. The former has a regular stride so the prefetcher can keep up to some extent, but the problem of hot and cold fields sharing the same cache line remains. The latter tends to scatter objects across the heap, which puts it at a disadvantage on both locality and prefetching. In practice, though, you rarely need to know the layout to that level of detail. Knowing when to compromise is also a skill. Most people who set out to learn SoA do so through game-oriented examples, such as Rust baby engines or Unity's ECS, so the examples here are also written with games in mind.
2. Core expressions
The common example is a structure that stores particle positions and velocities in SoA and performs a single-frame position update.
C#
using System;
using System.Numerics;
public sealed class ParticleSoa
{
private readonly float[] positionsX;
private readonly float[] positionsY;
private readonly float[] velocitiesX;
private readonly float[] velocitiesY;
private int count;
public ParticleSoa(int capacity)
{
if (capacity <= 0)
{
throw new ArgumentOutOfRangeException(nameof(capacity));
}
this.positionsX = new float[capacity];
this.positionsY = new float[capacity];
this.velocitiesX = new float[capacity];
this.velocitiesY = new float[capacity];
this.count = 0;
}
public int GetCount()
{
return this.count;
}
public int Add(float x, float y, float vx, float vy)
{
if (this.count >= this.positionsX.Length)
{
throw new InvalidOperationException("Insufficient particle storage capacity.");
}
int index = this.count;
this.positionsX[index] = x;
this.positionsY[index] = y;
this.velocitiesX[index] = vx;
this.velocitiesY[index] = vy;
this.count += 1;
return index;
}
public void Update(float deltaTime)
{
Span<float> x = this.positionsX.AsSpan(0, this.count);
Span<float> y = this.positionsY.AsSpan(0, this.count);
ReadOnlySpan<float> vx = this.velocitiesX.AsSpan(0, this.count);
ReadOnlySpan<float> vy = this.velocitiesY.AsSpan(0, this.count);
int width = Vector<float>.Count;
Vector<float> dt = new(deltaTime);
int index = 0;
for (; index <= this.count - width; index += width)
{
Vector<float> xVector = new(x.Slice(index, width));
Vector<float> yVector = new(y.Slice(index, width));
Vector<float> vxVector = new(vx.Slice(index, width));
Vector<float> vyVector = new(vy.Slice(index, width));
(xVector + vxVector * dt).CopyTo(x.Slice(index, width));
(yVector + vyVector * dt).CopyTo(y.Slice(index, width));
}
for (; index < this.count; index += 1)
{
x[index] += vx[index] * deltaTime;
y[index] += vy[index] * deltaTime;
}
}
public (float X, float Y) GetPosition(int index)
{
this.EnsureValidIndex(index);
return (this.positionsX[index], this.positionsY[index]);
}
private void EnsureValidIndex(int index)
{
if ((uint)index >= (uint)this.count)
{
throw new ArgumentOutOfRangeException(nameof(index));
}
}
}In C#, you can make SIMD intent explicit with Vector<float>. Note that the actual vector width varies depending on the runtime and CPU. For numerical hot paths, consider System.Numerics, Span<T>, ArrayPool<T>, or lower-level intrinsics.
TypeScript
export class ParticleSoa
{
readonly #positionsX: Float32Array;
readonly #positionsY: Float32Array;
readonly #velocitiesX: Float32Array;
readonly #velocitiesY: Float32Array;
#count: number;
public constructor(capacity: number)
{
if (!Number.isInteger(capacity) || capacity <= 0) {
throw new RangeError("`capacity` must be a positive integer (β₯ 1).");
}
this.#positionsX = new Float32Array(capacity);
this.#positionsY = new Float32Array(capacity);
this.#velocitiesX = new Float32Array(capacity);
this.#velocitiesY = new Float32Array(capacity);
this.#count = 0;
}
public getCount(): number
{
return this.#count;
}
public add(x: number, y: number, vx: number, vy: number): number
{
if (this.#count >= this.#positionsX.length) {
throw new RangeError("Insufficient particle storage capacity.");
}
const index = this.#count;
this.#positionsX[index] = x;
this.#positionsY[index] = y;
this.#velocitiesX[index] = vx;
this.#velocitiesY[index] = vy;
this.#count += 1;
return index;
}
public update(deltaTime: number): void
{
for (let index = 0; index < this.#count; index += 1) {
this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
}
}
public getPosition(index: number): readonly [number, number]
{
this.#ensureValidIndex(index);
return [this.#positionsX[index], this.#positionsY[index]];
}
#ensureValidIndex(index: number): void
{
if (!Number.isInteger(index) || index < 0 || index >= this.#count) {
throw new RangeError("`index` is not valid.");
}
}
}In TypeScript, Float32Array expresses SoA intent clearly. Because the optimizations a JavaScript engine applies internally vary by runtime, the primary goal is to establish a predictable layout using typed arrays rather than expecting direct SIMD.
Python
from array import array
class ParticleSoa:
def __init__(self, capacity: int) -> None:
if capacity <= 0:
raise ValueError("capacity must be at least 1.")
self._positions_x = array("f", [0.0]) * capacity
self._positions_y = array("f", [0.0]) * capacity
self._velocities_x = array("f", [0.0]) * capacity
self._velocities_y = array("f", [0.0]) * capacity
self._count = 0
def get_count(self) -> int:
return self._count
def add(self, x: float, y: float, vx: float, vy: float) -> int:
if self._count >= len(self._positions_x):
raise OverflowError("The particle storage capacity is insufficient.")
index = self._count
self._positions_x[index] = x
self._positions_y[index] = y
self._velocities_x[index] = vx
self._velocities_y[index] = vy
self._count += 1
return index
def update(self, delta_time: float) -> None:
for index in range(self._count):
self._positions_x[index] += self._velocities_x[index] * delta_time
self._positions_y[index] += self._velocities_y[index] * delta_time
def get_position(self, index: int) -> tuple[float, float]:
self._ensure_valid_index(index)
return (self._positions_x[index], self._positions_y[index])
def _ensure_valid_index(self, index: int) -> None:
if index < 0 or index >= self._count:
raise IndexError("index is invalid.")Using only the Python standard library, you can create a SoA shape with array. However, for genuinely fast large-scale numerical computation, you would typically consider tools such as NumPy, Numba, Cython, or Rust/C extensions. Even with a better data layout, pure Python loops can be bottlenecked by interpreter overhead.
Rust
#[derive(Debug)]
pub struct ParticleSoa {
positions_x: Vec<f32>,
positions_y: Vec<f32>,
velocities_x: Vec<f32>,
velocities_y: Vec<f32>,
count: usize,
}
impl ParticleSoa {
pub fn new(capacity: usize) -> Self {
assert!(capacity > 0, "capacity must be 1 or greater.");
Self {
positions_x: vec![0.0; capacity],
positions_y: vec![0.0; capacity],
velocities_x: vec![0.0; capacity],
velocities_y: vec![0.0; capacity],
count: 0,
}
}
pub fn get_count(&self) -> usize {
self.count
}
pub fn add(&mut self, x: f32, y: f32, vx: f32, vy: f32) -> usize {
assert!(
self.count < self.positions_x.len(),
"Particle storage capacity is insufficient."
);
let index = self.count;
self.positions_x[index] = x;
self.positions_y[index] = y;
self.velocities_x[index] = vx;
self.velocities_y[index] = vy;
self.count += 1;
index
}
pub fn update(&mut self, delta_time: f32) {
let positions_x = &mut self.positions_x[..self.count];
let positions_y = &mut self.positions_y[..self.count];
let velocities_x = &self.velocities_x[..self.count];
let velocities_y = &self.velocities_y[..self.count];
for (((x, y), vx), vy) in positions_x
.iter_mut()
.zip(positions_y.iter_mut())
.zip(velocities_x.iter())
.zip(velocities_y.iter())
{
*x += *vx * delta_time;
*y += *vy * delta_time;
}
}
pub fn get_position(&self, index: usize) -> Option<(f32, f32)> {
if index >= self.count {
return None;
}
Some((self.positions_x[index], self.positions_y[index]))
}
}In Rust, directly accessing multiple arrays by the same index is not always bad; LLVM may eliminate the bounds checks. However, in hot loops, using iterators with zip makes it easier for the compiler to understand the length relationships and increases the likelihood of bounds-check elimination and vectorization. This difference can matter especially in loops that simultaneously access multiple slices by the same index.
3. Call site
The call site does not touch the internal arrays of an SoA directly (from the perspective of separation of concerns and encapsulation information hiding). Responsibilities for creation policy, data addition, update pass, and result querying are kept separate.
type SpawnParticle = Readonly<{
x: number;
y: number;
vx: number;
vy: number;
}>;
type FramePolicy = Readonly<{
deltaTime: number;
maxParticles: number;
}>;
function createParticleStore(
particles: readonly SpawnParticle[],
policy: FramePolicy,
): ParticleSoa
{
if (particles.length > policy.maxParticles) {
throw new RangeError("The initial particle count exceeded `maxParticles`.");
}
const store = new ParticleSoa(policy.maxParticles);
for (const particle of particles) {
store.add(particle.x, particle.y, particle.vx, particle.vy);
}
return store;
}
function simulateOneFrame(
store: ParticleSoa,
policy: FramePolicy,
): void
{
store.update(policy.deltaTime);
}
const particles = createParticleStore(
[
{ x: 0, y: 0, vx: 10, vy: 0 },
{ x: 5, y: 5, vx: 0, vy: -2 },
],
{ deltaTime: 0.016, maxParticles: 1024 },
);
simulateOneFrame(particles, { deltaTime: 0.016, maxParticles: 1024 });
console.log(particles.getPosition(0));Separation of responsibilities:
SpawnParticle = External input DTO
FramePolicy = Per-frame execution policy
createParticleStore = Input validation and SoA initialization
ParticleSoa = Memory layout and update pass ownership
simulateOneFrame = Frame flow assembly4. Reading order
What fields does this pass actually read?
β Are the same fields grouped together in contiguous arrays?
β Does the loop traverse the 0..count range linearly?
β Are there branches remaining inside the hot loop?
β Is the shape amenable to SIMD or auto-vectorization?
β Does the external interface expose only safe methods instead of raw arrays?
β Do layout changes leak out to call sites?SoA code should be read not around a single object, but around which arrays a single pass consumes.
5. Boundaries and misconceptions
SoA is a layout pattern suited to bulk iterative computation. If you frequently edit individual particles, if domain logic strongly requires per-object invariants, or if the data count is small, AoS is worth considering. SoA is not a pattern that sacrifices encapsulation for performance; it simply practices a different kind of encapsulation. The baseline is to change only the internal layout while keeping the external API safe.
The benefits of SoA should be understood from two angles. First, reading only the required fields contiguously can improve cache locality. Second, applying the same operation to contiguous arrays of the same type makes SIMD and auto-vectorization easier. Leaving out the second point means only half the story has been told.
Auto-vectorization here is not guaranteed. Branches, aliasing, bounds checks, function calls, unclear length relationships, the possibility of exceptions, and runtime type uncertainty can all prevent vectorization. Therefore, SoA is a "shape that makes vectorization possible," not a spell that guarantees SIMD across every compiler and runtime.
Domain-level predictable cases and system-level concerns must also be separated. Exceeding capacity is a domain/policy issue, while a large typed-array allocation failure, out-of-memory condition, or runtime abort is an infrastructure problem. Because SoA code typically lives on the hot path, it is better to finish validation at construction time and at boundaries rather than inserting expensive exception handling or dynamic type checks in every iteration.
Production failure cases:
SoA internal arrays drift to different lengths, breaking the index invariant
Exposing raw arrays externally allows arbitrary modifications
During add/remove, only some arrays get updated, causing data misalignment
Even when most entries have `active=false`, the full capacity is iterated every frame
Too many branches inside the hot loop interfere with branch prediction and vectorization
In Rust, accessing multiple arrays by index may prevent bounds check elimination
Calling `getParticleObject()` in bulk inside a loop for debugging convenience
Replacing all data structures with SoA without any performance measurement6. Bad examples
type Particle = {
x: number;
y: number;
vx: number;
vy: number;
color: string;
debugName: string;
renderHandle: object;
active: boolean;
};
function updateParticlesBad(
particles: Particle[],
deltaTime: number,
): void
{
for (const particle of particles) {
if (!particle.active) {
continue;
}
particle.x += particle.vx * deltaTime;
particle.y += particle.vy * deltaTime;
}
}Why it's bad:
- `color`, `debugName`, and `renderHandle`, which the update pass never writes, are mixed into the same object.
- The loop only needs `x`/`y`/`vx`/`vy`, yet it accesses data object by object.
- With large datasets, the cost of per-object access and reference tracking can grow significantly.
- The active branch remains inside the hot loop.
- Because the data is interleaved, the layout is not SIMD-friendly for processing only `x` values in a contiguous sequence.
- Data with different lifetimes, such as render handles and debug names, are physically bundled together.As with any code, the right answer depends on the scale of the program. The code above is perfectly good when the particle count is small and individual edits are frequent. The real problem arises on a specific hot path when data that is never read or written in that path (data with a different lifetime and purpose) is physically bundled together with the main computation data, pointlessly burning CPU cache bandwidth.
7. Production scaling
7.1 Active index pattern
When many entries have active=false, scanning the entire array every time becomes wasteful. In that case, you add an active index array to the SoA and iterate only over the indices that actually need updating.
export class ActiveIndexParticleSoa
{
readonly #positionsX: Float32Array;
readonly #positionsY: Float32Array;
readonly #velocitiesX: Float32Array;
readonly #velocitiesY: Float32Array;
readonly #activeIndexes: Uint32Array;
#count: number;
#activeCount: number;
public constructor(capacity: number)
{
if (!Number.isInteger(capacity) || capacity <= 0) {
throw new RangeError("capacity must be an integer greater than or equal to 1.");
}
this.#positionsX = new Float32Array(capacity);
this.#positionsY = new Float32Array(capacity);
this.#velocitiesX = new Float32Array(capacity);
this.#velocitiesY = new Float32Array(capacity);
this.#activeIndexes = new Uint32Array(capacity);
this.#count = 0;
this.#activeCount = 0;
}
public add(x: number, y: number, vx: number, vy: number): number
{
if (this.#count >= this.#positionsX.length) {
throw new RangeError("Insufficient particle storage capacity.");
}
const index = this.#count;
this.#positionsX[index] = x;
this.#positionsY[index] = y;
this.#velocitiesX[index] = vx;
this.#velocitiesY[index] = vy;
this.#activeIndexes[this.#activeCount] = index;
this.#count += 1;
this.#activeCount += 1;
return index;
}
public updateActive(deltaTime: number): void
{
for (let cursor = 0; cursor < this.#activeCount; cursor += 1) {
const index = this.#activeIndexes[cursor];
this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
}
}
public getActiveCount(): number
{
return this.#activeCount;
}
}This approach is efficient for skipping dead particles, but it still involves indirect access through activeIndexes. It can be advantageous when the active ratio is low, but when the active ratio is high it may actually be slower than a simple dense array scan.
7.2 When order doesn't matter: swap-remove dense array
When particle order does not matter in a particle system, a stronger optimization is possible. Instead of marking dead particles as inactive, move the last live element in the array to the deleted position and then decrement count. This keeps the 0..count range always fully packed with live particles.
export class DenseParticleSoa
{
readonly #positionsX: Float32Array;
readonly #positionsY: Float32Array;
readonly #velocitiesX: Float32Array;
readonly #velocitiesY: Float32Array;
#count: number;
public constructor(capacity: number)
{
if (!Number.isInteger(capacity) || capacity <= 0) {
throw new RangeError("`capacity` must be a positive integer greater than or equal to 1.");
}
this.#positionsX = new Float32Array(capacity);
this.#positionsY = new Float32Array(capacity);
this.#velocitiesX = new Float32Array(capacity);
this.#velocitiesY = new Float32Array(capacity);
this.#count = 0;
}
public add(x: number, y: number, vx: number, vy: number): number
{
if (this.#count >= this.#positionsX.length) {
throw new RangeError("Insufficient capacity in the particle storage.");
}
const index = this.#count;
this.#positionsX[index] = x;
this.#positionsY[index] = y;
this.#velocitiesX[index] = vx;
this.#velocitiesY[index] = vy;
this.#count += 1;
return index;
}
public removeSwap(index: number): void
{
this.#ensureValidIndex(index);
const last = this.#count - 1;
this.#positionsX[index] = this.#positionsX[last];
this.#positionsY[index] = this.#positionsY[last];
this.#velocitiesX[index] = this.#velocitiesX[last];
this.#velocitiesY[index] = this.#velocitiesY[last];
this.#count -= 1;
}
public update(deltaTime: number): void
{
for (let index = 0; index < this.#count; index += 1) {
this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
}
}
public getCount(): number
{
return this.#count;
}
#ensureValidIndex(index: number): void
{
if (!Number.isInteger(index) || index < 0 || index >= this.#count) {
throw new RangeError("`index` is invalid.");
}
}
}The advantages of this approach are as follows.
- No `isActive` array needed.
- No indirect access through `activeIndexes` needed.
- The `0..count` range is always dense.
- No active branch inside the update loop.
- The layout becomes more amenable to SIMD and branch prediction.There are trade-offs as well.
The order changes on deletion.
If external code holds an index as a handle, it can become invalid.
If there is an `entityId` β `index` map, the map must also be updated when swapping.
This approach may be unsuitable for UI lists, replay logs, or deterministic serialization where stable ordering is required.Operational guidelines are typically as follows.
Order doesn't matter β swap-remove dense array
Order matters β stable compaction or activeIndexes
Deletions are rare β a simple isActive branch may suffice
Deletions are frequent and active entries are few β consider activeIndexes or dense compactionProduction metric setup:
particle.count
particle.capacity
particle.update.duration_ms
particle.items_per_ms
particle.remove.count
particle.swap_remove.count
particle.capacity_utilizationSoA should not be applied by gut feeling. Programming by intuition is comfortable, but measuring as you go is the right approach. Voluntarily adding complexity only to struggle with maintenance later is ultimately a cost you pay yourself. You need to compare AoS, SoA, active index, and swap-remove under the same input size, the same active ratio, and the same runtime conditions.
8. Comparison notes: C# / TypeScript / Python / Rust
Language | Idiomatic expression | Caution |
|---|---|---|
C# |
| Do not expose internal arrays directly to the outside |
TypeScript |
| Do not assume SIMD is automatically guaranteed |
Python |
| Pure Python loops carry significant interpreter overhead |
Rust |
| In hot loops, check whether multi-index access on multiple arrays may fail to eliminate bounds checks |
In Rust, pay particular attention to code that indexes multiple arrays simultaneously with for index in 0..count. While safe, this pattern can leave bounds checks in place on hot loops or interfere with vectorization. Slicing each array to [..count] to align their lengths and then iterating with iter_mut().zip(...) makes it easier for the compiler to understand the length relationships.
In C#, you can make SIMD intent explicit with Vector<T>, but actual performance depends on the CPU, JIT, data alignment, and loop shape. In TypeScript, the key is stabilizing the memory layout with Float32Array. In Python, if you truly need numerical performance, moving to a vectorization library like NumPy is generally the more pragmatic choice.
Learning a language in programming ultimately means internalizing that language's philosophy, as Alan Perlis put it. If you mimic a Rust-style ownership model in TypeScript, expect C-level performance from pure Python loops, or wrap everything in objects in C# and call it SoA, that is just schematic pattern-matching, so I'd encourage you to go back and study again.
That said, framing it this way makes it sound harder than it is. The only thing you really need to understand is that the domain model (Object) and the execution model (Pass) should be distinct. The Gather/Scatter pattern is commonly used to bridge these two models, but I won't cover it here. I'll write about it when I have time.
Personal note: In the end, the essence of SoA is severing the forced union of 'data for machines (execution model)' and 'data for humans (domain model)'. The impulse to cram everything into a single Object is an act of violence that imposes the structure of the human brain onto the CPU.
9. Further questions to consider
Does this pass truly need the entire object, or does it only need a few specific field arrays?
Is the benefit of SoA due to cache locality, the possibility of SIMD, or branch elimination?
As the active ratio drops, which is most appropriate:
isActive[],activeIndexes[], or swap-remove?Is this data order-insensitive, or is stable ordering a domain requirement?
Have you actually benchmarked the index-loop version against the iterator zip version in Rust?
What API allows adequate testing and debugging without exposing SoA's internal arrays to the outside?
Does this optimization actually reduce the real bottleneck, or does it only increase code complexity?
10. Summary
SoA stores data in contiguous per-field arrays rather than per-object storage.
The benefits of SoA lie not only in cache locality but also in the potential for SIMD and auto-vectorization.
In hot loops, you should minimize branches, bounds checks, indirect access, and allocation.
In Rust, using slice + iterator zip tends to be more optimization-friendly than accessing multiple arrays by the same index.
For particle systems where order does not matter, maintaining a always-dense array with swap-remove is a powerful option.
SoA is an internal execution layout; the external API should still be encapsulated.
Quick reference:
Arrange data not by object, but in the order that each pass reads its fields, and where possible, push it through dense loops with no branches.