SoA 데이터 레이아웃

SoA(Structure of Arrays)는 객체 하나에 모든 필드를 묶는 대신, 같은 종류의 필드를 각각의 연속 배열로 분리하는 데이터 배치 방식이다. AoS(Array of Structures)가 Particle[]처럼 “입자 객체들의 배열”을 만든다면, SoA는 x[], y[], vx[], vy[]처럼 “속성별 배열 묶음”을 만든다.

이 패턴의 의도는 도메인 모델과 실행 모델을 분리하는 것이다.
도메인 모델은 사람이 읽기 쉬워야 한다.
하지만 실행 모델은 CPU가 빠르게 훑을 수 있어야 한다. 객체지향 모델에서 Particle 객체 하나에 x, y, vx, vy, color, debugName, renderHandle이 모두 들어가면 읽기에는 편하다. 하지만 위치 갱신 pass가 실제로 필요한 것은 x, y, vx, vy뿐이다.

SoA가 빠를 수 있는 이유는 단순히 “캐시 미스를 줄이기 때문”만이 아니다.
같은 필드가 연속 배열에 모이면 CPU와 컴파일러가 SIMD, 즉 여러 데이터를 한 번에 처리하는 벡터 연산을 적용하기 쉬워진다. AoS에서는 x1, y1, vx1, vy1, x2, y2, vx2, vy2처럼 데이터가 섞인다. 반면 SoA에서는 x1, x2, x3, x4가 연속된다. 이 차이는 캐시 locality뿐 아니라 명령어 처리량에도 영향을 준다.

SoA가 강력한 기술이긴 하지만, 반대로 개별 객체를 자주 편집하거나, 데이터 개수가 작거나, 병목이 I/O에 있거나, 한 pass에서 객체의 거의 모든 필드를 읽는다면 전통적인 AoS가 더 단순하고 충분히 빠를 수 있다

SoA를 “객체지향을 버려라”로 해석하는 사람도 많지만, 정확히는 “hot path에서는 pass가 실제로 읽는 필드 순서대로 데이터를 배치하라”는 뜻에 가깝다.

핵심 공식:

Text

SoA = 같은 필드끼리 연속 배열로 배치
Cache Locality = 필요한 데이터만 캐시에 올리기
SIMD Friendliness = 같은 연산을 연속 데이터에 적용하기 쉽게 만들기
Dense Pass = branch 없이 0..count 구간만 선형 순회
Swap-remove = 순서가 중요하지 않을 때 삭제 비용을 O(1)에 가깝게 만들기

1. 써야할 상황은 언제일까?

입자 10만 개의 위치를 매 프레임 갱신한다고 하자.

Text

x = x + vx * dt
y = y + vy * dt

이 연산은 입자의 이름, 색상, 렌더링 핸들, 디버그 태그를 필요로 하지 않는다.
필요한 것은 x, y, vx, vy뿐이다.

보통 AoS는 이렇게 생겼다.

Text

particles = [
  { x, y, vx, vy, color, debugName, renderHandle },
  { x, y, vx, vy, color, debugName, renderHandle },
  ...
]

보통 SoA는 이렇게 생겼다.

Text

positionsX[]
positionsY[]
velocitiesX[]
velocitiesY[]

SoA에서 위치 갱신 pass는 필요한 배열만 선형으로 훑는다.

Text

positionsX[i] += velocitiesX[i] * dt
positionsY[i] += velocitiesY[i] * dt

"객체가 무엇인가"에서 "Pass가 무엇을 읽는가"로 시선을 옮기는 것은,
단순히 메모리 최적화 기법을 배우는 것을 넘어 프로그래머의 뇌 구조를 '인간(Human) 중심'에서 '하드웨어(Hardware) 중심'으로 포맷하는 철학적 전환점이다.

객체지향 프로그래밍(OOP)은 세상을 '명사(Noun)'로 본다. '입자(Particle)', '플레이어(Player)', '적(Enemy)'이라는 개념적 단위로 세상을 묶는다. 그렇기에 메모리 상에서도 이것들을 하나의 class나 struct 안에 묶어두는 것이 자연스럽다고 생각하고 이것을 보통 의미론적 응집도(Semantic Cohesion)를 가진다고 한다.
하지만 CPU는 그렇게 생각하지 않는다. 그냥 돌아갈 뿐이다.

AoS (객체 중심): 위치 갱신 Pass가 입자1.x를 읽으면, 캐시 라인에는 입자1의 쓰레기 데이터(색상, 렌더링 핸들 등)가 같이 딸려온다. 객체 배열이나 포인터 기반 레이아웃에서는 다음 입자 데이터가 힙에 흩어져 있어 프리패처가 따라가기 어렵고, struct array처럼 연속 배치되어 있더라도 hot field 대비 cold field 비율이 높아 캐시 대역폭이 낭비된다.
SoA (Pass 중심): Pass가 오직 x[]와 vx[] 배열만 순차적(Linear)으로 읽기 시작하면, 프리패처가 완벽히 동작한다. CPU가 연산을 수행하는 동안, 다음 수십 개의 데이터가 이미 백그라운드에서 L1 캐시로 배달된다. 메모리 지연 시간(Latency)이 사실상 매우 낮아진다.

참고로 AoS라고 해도 두 경우를 구분해야한다.
C/Rust/C# struct array처럼 객체 자체가 연속된 AoS와, JavaScript object array나 C# class array처럼 참조를 따라가는 AoS는 다르다.
전자는 stride가 규칙적이라 프리패처가 어느 정도 따라올 수 있지만, hot field와 cold field가 한 cache line에 섞이는 문제는 남는다.
후자는 객체가 힙에 흩어지기 쉬우므로 locality와 prefetch 양쪽에서 더 불리하다.
근데 현실적으로 그렇게까지 세부 레이아웃을 알 필욘 없다. 적당히 타협하는 것도 기술이다.

보통 SoA를 배울려고 하는 것은 보통은 러스트의 베비 엔진이나 Unity의 ECS같이 주로 게임쪽 예제가 많아서, 예제도 게임 중심으로 작성한다.

2. 핵심 표현

공통 예제는 입자의 위치와 속도를 SoA로 저장하고, 한 프레임의 위치 갱신을 수행하는 구조다.

C#

using System;
using System.Numerics;

public sealed class ParticleSoa
{
    private readonly float[] positionsX;
    private readonly float[] positionsY;
    private readonly float[] velocitiesX;
    private readonly float[] velocitiesY;
    private int count;

    public ParticleSoa(int capacity)
    {
        if (capacity <= 0)
        {
            throw new ArgumentOutOfRangeException(nameof(capacity));
        }

        this.positionsX = new float[capacity];
        this.positionsY = new float[capacity];
        this.velocitiesX = new float[capacity];
        this.velocitiesY = new float[capacity];
        this.count = 0;
    }

    public int GetCount()
    {
        return this.count;
    }

    public int Add(float x, float y, float vx, float vy)
    {
        if (this.count >= this.positionsX.Length)
        {
            throw new InvalidOperationException("입자 저장소 용량이 부족합니다.");
        }

        int index = this.count;
        this.positionsX[index] = x;
        this.positionsY[index] = y;
        this.velocitiesX[index] = vx;
        this.velocitiesY[index] = vy;
        this.count += 1;

        return index;
    }

    public void Update(float deltaTime)
    {
        Span<float> x = this.positionsX.AsSpan(0, this.count);
        Span<float> y = this.positionsY.AsSpan(0, this.count);
        ReadOnlySpan<float> vx = this.velocitiesX.AsSpan(0, this.count);
        ReadOnlySpan<float> vy = this.velocitiesY.AsSpan(0, this.count);

        int width = Vector<float>.Count;
        Vector<float> dt = new(deltaTime);

        int index = 0;

        for (; index <= this.count - width; index += width)
        {
            Vector<float> xVector = new(x.Slice(index, width));
            Vector<float> yVector = new(y.Slice(index, width));
            Vector<float> vxVector = new(vx.Slice(index, width));
            Vector<float> vyVector = new(vy.Slice(index, width));

            (xVector + vxVector * dt).CopyTo(x.Slice(index, width));
            (yVector + vyVector * dt).CopyTo(y.Slice(index, width));
        }

        for (; index < this.count; index += 1)
        {
            x[index] += vx[index] * deltaTime;
            y[index] += vy[index] * deltaTime;
        }
    }

    public (float X, float Y) GetPosition(int index)
    {
        this.EnsureValidIndex(index);

        return (this.positionsX[index], this.positionsY[index]);
    }

    private void EnsureValidIndex(int index)
    {
        if ((uint)index >= (uint)this.count)
        {
            throw new ArgumentOutOfRangeException(nameof(index));
        }
    }
}

C#에서는 Vector<float>로 SIMD 의도를 명시할 수 있다.
단, 실제 벡터 폭은 런타임과 CPU에 따라 달라진다. 수치 hot path라면 System.Numerics, Span<T>, ArrayPool<T>, 또는 더 낮은 수준의 intrinsics를 검토할 수 있다.

TypeScript

export class ParticleSoa
{
  readonly #positionsX: Float32Array;
  readonly #positionsY: Float32Array;
  readonly #velocitiesX: Float32Array;
  readonly #velocitiesY: Float32Array;
  #count: number;

  public constructor(capacity: number)
  {
    if (!Number.isInteger(capacity) || capacity <= 0) {
      throw new RangeError("capacity는 1 이상 정수여야 합니다.");
    }

    this.#positionsX = new Float32Array(capacity);
    this.#positionsY = new Float32Array(capacity);
    this.#velocitiesX = new Float32Array(capacity);
    this.#velocitiesY = new Float32Array(capacity);
    this.#count = 0;
  }

  public getCount(): number
  {
    return this.#count;
  }

  public add(x: number, y: number, vx: number, vy: number): number
  {
    if (this.#count >= this.#positionsX.length) {
      throw new RangeError("입자 저장소 용량이 부족합니다.");
    }

    const index = this.#count;
    this.#positionsX[index] = x;
    this.#positionsY[index] = y;
    this.#velocitiesX[index] = vx;
    this.#velocitiesY[index] = vy;
    this.#count += 1;

    return index;
  }

  public update(deltaTime: number): void
  {
    for (let index = 0; index < this.#count; index += 1) {
      this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
      this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
    }
  }

  public getPosition(index: number): readonly [number, number]
  {
    this.#ensureValidIndex(index);

    return [this.#positionsX[index], this.#positionsY[index]];
  }

  #ensureValidIndex(index: number): void
  {
    if (!Number.isInteger(index) || index < 0 || index >= this.#count) {
      throw new RangeError("index가 유효하지 않습니다.");
    }
  }
}

TypeScript에서는 Float32Array가 SoA 의도를 잘 드러낸다.
JavaScript 엔진이 내부에서 어떤 최적화를 할지는 런타임마다 다르므로, SIMD를 직접 기대하기보다 typed array 기반의 predictable layout을 만드는 것이 1차 목표다.

Python

from array import array


class ParticleSoa:
    def __init__(self, capacity: int) -> None:
        if capacity <= 0:
            raise ValueError("capacity는 1 이상이어야 합니다.")

        self._positions_x = array("f", [0.0]) * capacity
        self._positions_y = array("f", [0.0]) * capacity
        self._velocities_x = array("f", [0.0]) * capacity
        self._velocities_y = array("f", [0.0]) * capacity
        self._count = 0

    def get_count(self) -> int:
        return self._count

    def add(self, x: float, y: float, vx: float, vy: float) -> int:
        if self._count >= len(self._positions_x):
            raise OverflowError("입자 저장소 용량이 부족합니다.")

        index = self._count
        self._positions_x[index] = x
        self._positions_y[index] = y
        self._velocities_x[index] = vx
        self._velocities_y[index] = vy
        self._count += 1

        return index

    def update(self, delta_time: float) -> None:
        for index in range(self._count):
            self._positions_x[index] += self._velocities_x[index] * delta_time
            self._positions_y[index] += self._velocities_y[index] * delta_time

    def get_position(self, index: int) -> tuple[float, float]:
        self._ensure_valid_index(index)

        return (self._positions_x[index], self._positions_y[index])

    def _ensure_valid_index(self, index: int) -> None:
        if index < 0 or index >= self._count:
            raise IndexError("index가 유효하지 않습니다.")

Python 표준 라이브러리만 쓰면 array로 SoA 모양을 만들 수 있다.
하지만 대규모 수치 연산을 진짜로 빠르게 처리하려면 보통 NumPy, Numba, Cython, Rust/C 확장 같은 도구를 검토한다. 순수 Python 루프는 데이터 배치를 바꿔도 인터프리터 overhead가 병목이 될 수 있다.

Rust

#[derive(Debug)]
pub struct ParticleSoa {
    positions_x: Vec<f32>,
    positions_y: Vec<f32>,
    velocities_x: Vec<f32>,
    velocities_y: Vec<f32>,
    count: usize,
}

impl ParticleSoa {
    pub fn new(capacity: usize) -> Self {
        assert!(capacity > 0, "capacity는 1 이상이어야 합니다.");

        Self {
            positions_x: vec![0.0; capacity],
            positions_y: vec![0.0; capacity],
            velocities_x: vec![0.0; capacity],
            velocities_y: vec![0.0; capacity],
            count: 0,
        }
    }

    pub fn get_count(&self) -> usize {
        self.count
    }

    pub fn add(&mut self, x: f32, y: f32, vx: f32, vy: f32) -> usize {
        assert!(
            self.count < self.positions_x.len(),
            "입자 저장소 용량이 부족합니다."
        );

        let index = self.count;
        self.positions_x[index] = x;
        self.positions_y[index] = y;
        self.velocities_x[index] = vx;
        self.velocities_y[index] = vy;
        self.count += 1;

        index
    }

    pub fn update(&mut self, delta_time: f32) {
        let positions_x = &mut self.positions_x[..self.count];
        let positions_y = &mut self.positions_y[..self.count];
        let velocities_x = &self.velocities_x[..self.count];
        let velocities_y = &self.velocities_y[..self.count];

        for (((x, y), vx), vy) in positions_x
            .iter_mut()
            .zip(positions_y.iter_mut())
            .zip(velocities_x.iter())
            .zip(velocities_y.iter())
        {
            *x += *vx * delta_time;
            *y += *vy * delta_time;
        }
    }

    pub fn get_position(&self, index: usize) -> Option<(f32, f32)> {
        if index >= self.count {
            return None;
        }

        Some((self.positions_x[index], self.positions_y[index]))
    }
}

Rust에서는 여러 배열을 같은 index로 직접 접근하는 코드가 항상 나쁜 것은 아니다. LLVM이 bounds check를 제거할 수도 있다.
하지만 hot loop에서는 iterator와 zip을 쓰는 편이 컴파일러가 길이 관계를 이해하기 쉽고, bounds check 제거와 벡터화 가능성을 높이기 쉽다.
특히 여러 slice를 같은 index로 동시에 접근하는 루프에서는 이 차이가 중요해질 수 있다.

3. 호출부

호출부는 SoA 내부 배열을 직접 건드리지 않는다.
(관심사 분리, 캡슐화 정보 은닉의 관점에서 말이다)
생성 정책, 데이터 추가, update pass, 결과 조회의 책임을 분리한다.

TypeScript

type SpawnParticle = Readonly<{
  x: number;
  y: number;
  vx: number;
  vy: number;
}>;

type FramePolicy = Readonly<{
  deltaTime: number;
  maxParticles: number;
}>;

function createParticleStore(
  particles: readonly SpawnParticle[],
  policy: FramePolicy,
): ParticleSoa
{
  if (particles.length > policy.maxParticles) {
    throw new RangeError("초기 입자 수가 maxParticles를 초과했습니다.");
  }

  const store = new ParticleSoa(policy.maxParticles);

  for (const particle of particles) {
    store.add(particle.x, particle.y, particle.vx, particle.vy);
  }

  return store;
}

function simulateOneFrame(
  store: ParticleSoa,
  policy: FramePolicy,
): void
{
  store.update(policy.deltaTime);
}

const particles = createParticleStore(
  [
    { x: 0, y: 0, vx: 10, vy: 0 },
    { x: 5, y: 5, vx: 0, vy: -2 },
  ],
  { deltaTime: 0.016, maxParticles: 1024 },
);

simulateOneFrame(particles, { deltaTime: 0.016, maxParticles: 1024 });

console.log(particles.getPosition(0));

책임 분리:

Text

SpawnParticle        = 외부 입력 DTO
FramePolicy          = 프레임 단위 실행 정책
createParticleStore  = 입력 검증과 SoA 초기화
ParticleSoa          = 메모리 레이아웃과 update pass 소유
simulateOneFrame     = 프레임 흐름 조립

4. 읽는 순서

Text

이 pass가 실제로 읽는 필드는 무엇인가
→ 같은 필드가 연속 배열로 모여 있는가
→ 루프가 0..count 구간을 선형으로 훑는가
→ branch가 hot loop 안에 남아 있는가
→ SIMD나 자동 벡터화가 가능한 모양인가
→ 외부에는 raw array 대신 안전한 메서드만 노출되는가
→ 레이아웃 변경이 호출부로 새지 않는가

SoA 코드는 객체 하나를 중심으로 소비하는게 아니라
pass 하나가 어떤 배열들을 소비하는지를 중심으로 읽어야 한다.

5. 경계와 오해

SoA는 대량 반복 연산에 맞는 레이아웃 패턴이다.
입자를 하나씩 자주 편집하거나, 도메인 로직이 객체 단위 불변식을 강하게 요구하거나, 데이터 개수가 작다면 AoS를 검토할 가치가 있다.
SoA는 성능을 위해 캡슐화를 포기하는 패턴이 아니라 다른 방식의 캡슐화를 한다. 내부 배치만 바꾸고 외부 API는 안전하게 유지하는 패턴이 기본이다.

SoA의 장점은 두 가지로 나눠서 봐야 한다. 첫째, 필요한 필드만 연속으로 읽으므로 cache locality가 좋아질 수 있다. 둘째, 같은 연산을 같은 타입의 연속 배열에 적용하므로 SIMD와 자동 벡터화가 쉬워질 수 있다. 두 번째가 빠지면 SoA 설명은 절반만 설명한 셈이다.

여기서 자동 벡터화는 보장되지 않는다. 분기, aliasing, bounds check, 함수 호출, 불명확한 길이 관계, 예외 가능성, 런타임 타입 불확실성은 벡터화를 방해할 수 있다. 따라서 SoA는 벡터화를 “가능하게 만드는 모양”이지, 모든 컴파일러와 런타임에서 항상 SIMD를 보장하는 주문이 아니다.

도메인 수준의 예측 가능한 경우와 시스템 수준 문제도 분리해야 한다. capacity 초과는 도메인/정책 문제다. 반면 대형 typed array 할당 실패, 메모리 부족, 런타임 중단은 인프라 문제다. SoA 코드는 보통 hot path에 들어가므로, 매 반복마다 비싼 예외 처리나 동적 타입 검증을 넣기보다 생성 시점과 경계에서 검증을 끝내는 편이 낫다.

프로덕션 실패 사례:

Text

- SoA 내부 배열 길이가 서로 달라져 index 불변식이 깨짐
- 외부에 raw array를 노출해 임의 수정이 가능해짐
- add/remove 시 일부 배열만 갱신되어 데이터가 어긋남
- active=false가 많은데도 전체 capacity를 매 프레임 순회함
- hot loop 안에 branch가 많아 branch prediction과 vectorization을 방해함
- Rust에서 여러 배열을 index로 접근해 bounds check 제거가 실패할 수 있음
- 디버깅 편의를 위해 getParticleObject()를 루프 안에서 대량 호출함
- 성능 측정 없이 모든 자료구조를 SoA로 바꿈

6. 잘못된 예제

TypeScript

type Particle = {
  x: number;
  y: number;
  vx: number;
  vy: number;
  color: string;
  debugName: string;
  renderHandle: object;
  active: boolean;
};

function updateParticlesBad(
  particles: Particle[],
  deltaTime: number,
): void
{
  for (const particle of particles) {
    if (!particle.active) {
      continue;
    }

    particle.x += particle.vx * deltaTime;
    particle.y += particle.vy * deltaTime;
  }
}

나쁜 이유:

Text

- update pass가 쓰지 않는 color, debugName, renderHandle이 같은 객체에 섞여 있다.
- 루프는 x/y/vx/vy만 필요하지만 객체 단위로 접근한다.
- 대량 데이터에서 per-object 접근과 참조 추적 비용이 커질 수 있다.
- active branch가 hot loop 안에 남아 있다.
- 데이터가 교차되어 있어 x값만 연속으로 처리하는 SIMD 친화 모양이 아니다.
- 렌더링 핸들, 디버그 이름 같은 수명이 다른 데이터를 물리적으로 묶는다.

코드라는게 항상 그렇듯 프로그램 스케일에 따라 맞는 정답이 있다. 위의 코드는 입자 수가 작고 개별 편집이 많다면 충분히 좋은 코드다.
진짜 문제는, 특정 핫 패스(Hot Path)에서 '단 한 번도 읽거나 쓰지 않는 데이터(수명과 목적이 다른 데이터)'를 '메인 연산 데이터'와
물리적으로 한 덩어리로 묶어두어, CPU 캐시 대역폭을 무의미하게 소각하는 것이다.

7. 프로덕션 확장

7.1 Active index 방식

active=false가 많아지면 전체 배열을 매번 순회하는 것도 낭비가 된다. 이때는 SoA에 활성 인덱스 배열을 추가해 실제로 갱신할 인덱스만 순회한다.

TypeScript

export class ActiveIndexParticleSoa
{
  readonly #positionsX: Float32Array;
  readonly #positionsY: Float32Array;
  readonly #velocitiesX: Float32Array;
  readonly #velocitiesY: Float32Array;
  readonly #activeIndexes: Uint32Array;
  #count: number;
  #activeCount: number;

  public constructor(capacity: number)
  {
    if (!Number.isInteger(capacity) || capacity <= 0) {
      throw new RangeError("capacity는 1 이상 정수여야 합니다.");
    }

    this.#positionsX = new Float32Array(capacity);
    this.#positionsY = new Float32Array(capacity);
    this.#velocitiesX = new Float32Array(capacity);
    this.#velocitiesY = new Float32Array(capacity);
    this.#activeIndexes = new Uint32Array(capacity);
    this.#count = 0;
    this.#activeCount = 0;
  }

  public add(x: number, y: number, vx: number, vy: number): number
  {
    if (this.#count >= this.#positionsX.length) {
      throw new RangeError("입자 저장소 용량이 부족합니다.");
    }

    const index = this.#count;
    this.#positionsX[index] = x;
    this.#positionsY[index] = y;
    this.#velocitiesX[index] = vx;
    this.#velocitiesY[index] = vy;
    this.#activeIndexes[this.#activeCount] = index;
    this.#count += 1;
    this.#activeCount += 1;

    return index;
  }

  public updateActive(deltaTime: number): void
  {
    for (let cursor = 0; cursor < this.#activeCount; cursor += 1) {
      const index = this.#activeIndexes[cursor];

      this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
      this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
    }
  }

  public getActiveCount(): number
  {
    return this.#activeCount;
  }
}

이 방식은 죽은 입자를 건너뛰는 데 유리하다. 하지만 여전히 activeIndexes를 통해 간접 접근을 한다. active 비율이 낮으면 유리할 수 있지만, active 비율이 높으면 단순 dense array 순회보다 느릴 수 있다.

7.2 순서가 중요하지 않은 경우: Swap-remove Dense Array

입자 시스템에서 순서가 중요하지 않다면 더 강한 최적화가 가능하다.
죽은 입자를 inactive로 표시하지 말고, 배열의 마지막 살아 있는 요소를 삭제 위치로 옮긴 뒤 count를 줄인다. 이렇게 하면 0..count 구간은 항상 살아 있는 입자로 꽉 찬다.

TypeScript

export class DenseParticleSoa
{
  readonly #positionsX: Float32Array;
  readonly #positionsY: Float32Array;
  readonly #velocitiesX: Float32Array;
  readonly #velocitiesY: Float32Array;
  #count: number;

  public constructor(capacity: number)
  {
    if (!Number.isInteger(capacity) || capacity <= 0) {
      throw new RangeError("capacity는 1 이상 정수여야 합니다.");
    }

    this.#positionsX = new Float32Array(capacity);
    this.#positionsY = new Float32Array(capacity);
    this.#velocitiesX = new Float32Array(capacity);
    this.#velocitiesY = new Float32Array(capacity);
    this.#count = 0;
  }

  public add(x: number, y: number, vx: number, vy: number): number
  {
    if (this.#count >= this.#positionsX.length) {
      throw new RangeError("입자 저장소 용량이 부족합니다.");
    }

    const index = this.#count;
    this.#positionsX[index] = x;
    this.#positionsY[index] = y;
    this.#velocitiesX[index] = vx;
    this.#velocitiesY[index] = vy;
    this.#count += 1;

    return index;
  }

  public removeSwap(index: number): void
  {
    this.#ensureValidIndex(index);

    const last = this.#count - 1;

    this.#positionsX[index] = this.#positionsX[last];
    this.#positionsY[index] = this.#positionsY[last];
    this.#velocitiesX[index] = this.#velocitiesX[last];
    this.#velocitiesY[index] = this.#velocitiesY[last];

    this.#count -= 1;
  }

  public update(deltaTime: number): void
  {
    for (let index = 0; index < this.#count; index += 1) {
      this.#positionsX[index] += this.#velocitiesX[index] * deltaTime;
      this.#positionsY[index] += this.#velocitiesY[index] * deltaTime;
    }
  }

  public getCount(): number
  {
    return this.#count;
  }

  #ensureValidIndex(index: number): void
  {
    if (!Number.isInteger(index) || index < 0 || index >= this.#count) {
      throw new RangeError("index가 유효하지 않습니다.");
    }
  }
}

이 방식의 장점은 아래와 같다.

Text

- isActive 배열이 필요 없다.
- activeIndexes 간접 접근도 필요 없다.
- 0..count 구간이 항상 dense하다.
- update loop 안에 active branch가 없다.
- SIMD와 branch prediction에 더 유리한 모양이 된다.

대신 trade-off도 있다.

Text

- 삭제 시 순서가 바뀐다.
- 외부에서 index를 handle처럼 들고 있으면 깨질 수 있다.
- entityId → index map이 있다면 swap 시 map도 갱신해야 한다.
- stable order가 필요한 UI list, replay log, deterministic serialization에는 부적합할 수 있다.

운영 기준은 보통 이렇다.

Text

순서가 중요하지 않다       → swap-remove dense array
순서가 중요하다           → stable compaction 또는 activeIndexes
삭제가 드물다             → 단순 isActive branch도 충분할 수 있음
삭제가 많고 active가 적다  → activeIndexes 또는 dense compaction 검토

프로덕션 metric 설정:

Text

particle.count
particle.capacity
particle.update.duration_ms
particle.items_per_ms
particle.remove.count
particle.swap_remove.count
particle.capacity_utilization

SoA는 감으로 적용하면 안 된다. '감'으로 프로그래밍하면 편하긴 하지만, 측정하면서 하는게 맞다. 굳이 사서 복잡도를 올려봤자 유지보수로 고생하는 건 결국 나 자신이다.
같은 입력 크기, 같은 active 비율, 같은 런타임 조건에서 AoS, SoA, active index, swap-remove를 비교해야 한다.

8. C# / TypeScript / Python / Rust 비교 메모

언어	관용 표현	주의점
C#	`float[]`, `Span<T>`, `Vector<T>`, `ArrayPool<T>`	외부에 배열을 그대로 노출하지 말 것
TypeScript	`Float32Array`, `Uint8Array`, `Uint32Array`	SIMD를 자동 보장한다고 착각하지 말 것
Python	`array`, `memoryview`, NumPy	순수 Python 루프는 인터프리터 overhead가 큼
Rust	`Vec<f32>`, slice, iterator `zip`	hot loop에서 다중 index 접근은 bounds check 제거 실패 가능성을 점검

Rust에서는 특히 for index in 0..count로 여러 배열을 동시에 인덱싱하는 코드를 주의해서 봐야 한다. 안전한 코드이긴 하지만 hot loop에서는 bounds check가 남거나 vectorization을 방해할 수 있다. 먼저 slice를 [..count]로 잘라 길이를 맞추고, iter_mut().zip(...) 형태로 순회하면 컴파일러가 길이 관계를 더 쉽게 이해할 수 있다.

C#에서는 Vector<T>로 SIMD 의도를 명시할 수 있지만, 실제 성능은 CPU, JIT, 데이터 정렬, 루프 형태에 따라 달라진다. TypeScript는 Float32Array로 메모리 layout을 안정화하는 것이 핵심이고, Python은 실제 수치 성능이 필요하면 NumPy 같은 벡터화 라이브러리로 넘어가는 게 보통 더 현실적이다.

프로그래밍에서 언어를 배운다는 것은 결국 팰런 펄리스의 말대로 그 언어의 철학을 내재한다는 것이다. TypeScript에서 Rust식 소유권 모델을 흉내 내거나, Python 표준 루프로 C급 성능을 기대하거나, C#에서 모든 것을 객체로 감싼 뒤 SoA라고 부르면 그건 그냥 도식화된 패턴이니까 다시 공부하길 권한다.

사실 이렇게 말하면 너무 어려운데, '도메인 모델(Object)'과 '실행 모델(Pass)'이 달라야한다는 것만 이해하면 된다. 이 2가지 모델을 같이 공유시키는 패턴으로 Gather/Scatter 패턴을 주로 쓰는데, 그건 굳이 여기에 쓰지 않겠다. 나중에 시간나면 쓰도록 하겠다.

개인메모:결국 SoA의 본질은 '기계를 위한 데이터(실행 모델)'와 '인간을 위한 데이터(도메인 모델)'의 억지스러운 결합을 끊어내는 것이다. 모든 것을 객체(Object) 하나에 욱여넣으려는 시도는, 인간의 뇌 구조를 CPU에게 강요하는 폭력이다

9. 추가로 생각해보기

이 pass는 정말 객체 전체가 필요한가, 아니면 특정 필드 배열 몇 개만 필요한가?
SoA의 이점이 cache locality 때문인가, SIMD 가능성 때문인가, branch 제거 때문인가?
active 비율이 낮아질 때 isActive[], activeIndexes[], swap-remove 중 무엇이 가장 적합한가?
순서가 바뀌어도 되는 데이터인가, 아니면 stable order가 도메인 요구사항인가?
Rust에서 index loop와 iterator zip 버전을 실제 benchmark로 비교했는가?
SoA 내부 배열을 외부에 노출하지 않고도 테스트와 디버깅을 충분히 할 수 있는 API는 무엇인가?
이 최적화는 실제 병목을 줄이는가, 아니면 코드 복잡도만 올리는가?

10. 요약

SoA는 객체별 저장이 아니라 필드별 연속 배열 저장이다.
SoA의 이점은 cache locality뿐 아니라 SIMD/자동 벡터화 가능성에도 있다.
Hot loop에서는 branch, bounds check, indirect access, allocation을 줄여야 한다.
Rust에서는 여러 배열을 같은 index로 접근하기보다 slice + iterator zip이 더 최적화 친화적일 수 있다.
순서가 중요하지 않은 입자 시스템에서는 swap-remove로 항상 dense array를 유지하는 것이 강력한 선택지다.
SoA는 내부 실행 레이아웃이고, 외부 API는 여전히 캡슐화해야 한다.

간단 암기:

Text

객체가 아니라 pass가 읽는 필드 순서대로 데이터를 배치하고, 가능하면 branch 없는 dense loop로 흘려라.