Building an Aggregate Report After Grouping
Aggregation is decomposed into five stages: filter, group, sum, sort, and take, each made explicit in the pipeline. Accumulation is handled with a map, and top-N selection is separated into sorting followed by slicing. Monetary amounts are represented as integers or decimals, and group keys are normalized before grouping.
Practice memo · 14 min read · Easy
Building an Aggregate Report After Grouping
Calculate the total amount per customer from an order list and extract only the top customers. The key is to make filter, group, sum, sort, and take read as separate, distinct stages.
In practice, aggregate reports always decompose into the same five stages: filter, group, sum, sort, take. Beginner code crams all five into a single loop, making it hard to modify, while good code exposes each stage as a clear pipeline. When the stages are visible, you can pinpoint exactly where and what went wrong at that specific step.
The goal of this pattern is not to write less code, but to expose the intent of each data transformation as an explicit stage. C# LINQ expresses all five stages as a chain, Rust handles vacant/occupied cases together with a single HashMap::entry call, and Python skips accumulation-key initialization entirely with defaultdict. Here, the entry API bundles the branch logic into one call: if the key already exists, it modifies the existing value; if not, it inserts a new one. The skeleton is the same across languages: use a map for accumulation, a key function for sorting, and slicing for top-N selection.
Personal note: keep the accumulation map as mutable state inside the function but return only an immutable result to the outside; likewise, once a sorted list is produced, allow it to be consumed only by the next stage so it cannot be modified externally. Treat monetary amounts as integer minor units or a decimal type, not floating-point numbers. The code in this document already follows this principle: C# uses
decimal, Python usesint, and Rust usesi64. Only TypeScript allows floating-point throughnumber, soamountMinorhas been changed tobigint(integer minor unit). If you must usenumber, you need to lock in the constraint that "it is an integer minor unit used only within the safe integer range." Also, iflimitis 0 or less, return an empty collection before any computation in all languages, because slice and take behavior for negative limits varies by language (cutting from the end, subtracting one from the total, etc.), producing inconsistent results.
Notation
This card follows each language's idiomatic naming conventions. C# uses PascalCase, TypeScript uses camelCase, and Python and Rust use snake_case. It is perfectly normal for the same concept to appear as
CustomerId,customerId, orcustomer_id, and understanding that naming mapping is itself one of the learning points of this card.
C#
public sealed record OrderRow(int CustomerId, decimal Amount, bool IsPaid);
public sealed record CustomerTotal(int CustomerId, decimal TotalAmount);
public static IReadOnlyList<CustomerTotal> CalculateTopCustomers(
IReadOnlyList<OrderRow> orders,
int limit)
{
if (limit <= 0)
{
return Array.Empty<CustomerTotal>();
}
return orders
.Where(order => order.IsPaid)
.GroupBy(order => order.CustomerId)
.Select(group => new CustomerTotal(
group.Key,
group.Sum(order => order.Amount)))
.OrderByDescending(row => row.TotalAmount)
.ThenBy(row => row.CustomerId)
.Take(limit)
.ToArray();
}TypeScript
type OrderRow = Readonly<{
customerId: number;
amountMinor: bigint;
isPaid: boolean;
}>;
type CustomerTotal = Readonly<{
customerId: number;
totalAmountMinor: bigint;
}>;
function calculateTopCustomers(
orders: readonly OrderRow[],
limit: number,
): readonly CustomerTotal[] {
if (limit <= 0) {
return [];
}
const totals = new Map<number, bigint>();
for (const order of orders) {
if (!order.isPaid) {
continue;
}
const currentTotal = totals.get(order.customerId) ?? 0n;
totals.set(order.customerId, currentTotal + order.amountMinor);
}
return [...totals.entries()]
.map(([customerId, totalAmountMinor]) => ({ customerId, totalAmountMinor }))
.sort((left, right) => {
if (left.totalAmountMinor !== right.totalAmountMinor) {
return left.totalAmountMinor < right.totalAmountMinor ? 1 : -1;
}
return left.customerId - right.customerId;
})
.slice(0, limit);
}Python
from collections import defaultdict
from collections.abc import Iterable
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class OrderRow:
customer_id: int
amount: int
is_paid: bool
@dataclass(frozen=True, slots=True)
class CustomerTotal:
customer_id: int
total_amount: int
def calculate_top_customers(
orders: Iterable[OrderRow],
limit: int,
) -> tuple[CustomerTotal, ...]:
if limit <= 0:
return ()
totals: defaultdict[int, int] = defaultdict(int)
for order in orders:
if not order.is_paid:
continue
totals[order.customer_id] += order.amount
rows = (
CustomerTotal(customer_id, total_amount)
for customer_id, total_amount in totals.items()
)
ordered = sorted(rows, key=lambda row: (-row.total_amount, row.customer_id))
return tuple(ordered[:limit])
Rust
use std::cmp::Ordering;
use std::collections::HashMap;
use std::error::Error;
use std::fmt;
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct OrderRow {
customer_id: u32,
amount: i64,
is_paid: bool,
}
impl OrderRow {
pub fn new(
customer_id: u32,
amount: i64,
is_paid: bool,
) -> Result<Self, OrderError> {
if amount < 0 {
return Err(OrderError::NegativeAmount { amount });
}
Ok(Self {
customer_id,
amount,
is_paid,
})
}
pub fn customer_id(&self) -> u32 {
self.customer_id
}
pub fn amount(&self) -> i64 {
self.amount
}
pub fn is_paid(&self) -> bool {
self.is_paid
}
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct CustomerTotal {
customer_id: u32,
total_amount: i64,
}
impl CustomerTotal {
pub fn new(
customer_id: u32,
total_amount: i64,
) -> Result<Self, OrderError> {
if total_amount < 0 {
return Err(OrderError::NegativeTotalAmount { total_amount });
}
Ok(Self {
customer_id,
total_amount,
})
}
pub fn customer_id(&self) -> u32 {
self.customer_id
}
pub fn total_amount(&self) -> i64 {
self.total_amount
}
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum OrderError {
NegativeAmount { amount: i64 },
NegativeTotalAmount { total_amount: i64 },
AmountOverflow { customer_id: u32 },
}
impl fmt::Display for OrderError {
fn fmt(&self, formatter: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::NegativeAmount { amount } => {
write!(formatter, "order amount must be non-negative: {amount}")
}
Self::NegativeTotalAmount { total_amount } => {
write!(
formatter,
"total amount must be non-negative: {total_amount}"
)
}
Self::AmountOverflow { customer_id } => {
write!(formatter, "amount overflow for customer_id={customer_id}")
}
}
}
}
impl Error for OrderError {}
pub fn calculate_top_customers(
orders: &[OrderRow],
limit: usize,
) -> Result<Vec<CustomerTotal>, OrderError> {
if limit == 0 {
return Ok(Vec::new());
}
let totals = accumulate_paid_totals(orders)?;
let mut rows = to_customer_totals(totals)?;
rows.sort_by(compare_customer_totals);
rows.truncate(limit);
Ok(rows)
}
fn accumulate_paid_totals(
orders: &[OrderRow],
) -> Result<HashMap<u32, i64>, OrderError> {
let mut totals = HashMap::<u32, i64>::new();
for order in orders {
if !order.is_paid() {
continue;
}
let total = totals.entry(order.customer_id()).or_insert(0);
*total = total
.checked_add(order.amount())
.ok_or(OrderError::AmountOverflow {
customer_id: order.customer_id(),
})?;
}
Ok(totals)
}
fn to_customer_totals(
totals: HashMap<u32, i64>,
) -> Result<Vec<CustomerTotal>, OrderError> {
totals
.into_iter()
.map(|(customer_id, total_amount)| {
CustomerTotal::new(customer_id, total_amount)
})
.collect()
}
fn compare_customer_totals(
left: &CustomerTotal,
right: &CustomerTotal,
) -> Ordering {
right
.total_amount()
.cmp(&left.total_amount())
.then_with(|| left.customer_id().cmp(&right.customer_id()))
}Call site
All four languages demonstrate three cases:
normal (aggregation and top N), tie-break (ascending customerId order regardless of input order), and limit guard (empty result when limit is 0 or less).
C#
// 1) Normal case: customer 1 totals 100 + 25 = 125, customer 2 totals 50
var orders = new[]
{
new OrderRow(1, 100m, true),
new OrderRow(2, 50m, true),
new OrderRow(1, 25m, true),
new OrderRow(3, 80m, false), // Unpaid -> excluded by filter
};
CalculateTopCustomers(orders, 2);
// [CustomerTotal(1, 125), CustomerTotal(2, 50)]
// 2) Tie-break: both customers have the same total of 100.
// In the input, customer 2 comes first, but on a tie the sort is ascending by `CustomerId`, so customer 1 appears first.
var tied = new[]
{
new OrderRow(2, 100m, true),
new OrderRow(1, 100m, true),
};
CalculateTopCustomers(tied, 2);
// [CustomerTotal(1, 100), CustomerTotal(2, 100)]
// 3) Limit guard: if the limit is 0 or below, return an empty result without any computation
CalculateTopCustomers(orders, 0); // []
CalculateTopCustomers(orders, -1); // []
TypeScript
// 1) Normal case: customer 1 totals 100 + 25 = 125, customer 2 totals 50
const orders: readonly OrderRow[] = [
{ customerId: 1, amountMinor: 100n, isPaid: true },
{ customerId: 2, amountMinor: 50n, isPaid: true },
{ customerId: 1, amountMinor: 25n, isPaid: true },
{ customerId: 3, amountMinor: 80n, isPaid: false }, // Unpaid -> excluded by filter
];
calculateTopCustomers(orders, 2);
// [{ customerId: 1, totalAmountMinor: 125n }, { customerId: 2, totalAmountMinor: 50n }]
// 2) Tie-break: two customers have the same total of 100n.
// Customer 2 appears first in the input, but on a tie the order is ascending by `customerId`, so customer 1 comes first.
const tied: readonly OrderRow[] = [
{ customerId: 2, amountMinor: 100n, isPaid: true },
{ customerId: 1, amountMinor: 100n, isPaid: true },
];
calculateTopCustomers(tied, 2);
// [{ customerId: 1, totalAmountMinor: 100n }, { customerId: 2, totalAmountMinor: 100n }]
// 3) Limit guard: if the limit is 0 or less, return an empty result without computing
calculateTopCustomers(orders, 0); // []
calculateTopCustomers(orders, -1); // []
Python
# 1) Normal case: customer 1 totals 100 + 25 = 125, customer 2 totals 50
orders = (
OrderRow(1, 100, True),
OrderRow(2, 50, True),
OrderRow(1, 25, True),
OrderRow(3, 80, False), # Unpaid -> excluded by filter
)
calculate_top_customers(orders, 2)
# (CustomerTotal(customer_id=1, total_amount=125),
# CustomerTotal(customer_id=2, total_amount=50))
# 2) Tie-break: both customers have the same total of 100.
# Customer 2 appears first in the input, but when there is a tie, the result is sorted by `customer_id` ascending, so customer 1 comes first.
tied = (
OrderRow(2, 100, True),
OrderRow(1, 100, True),
)
calculate_top_customers(tied, 2)
# (CustomerTotal(customer_id=1, total_amount=100),
# CustomerTotal(customer_id=2, total_amount=100))
# 3) Limit guard: if the limit is 0 or less, return an empty result without any computation
calculate_top_customers(orders, 0) # ()
calculate_top_customers(orders, -1) # ()
Rust
// 1) Normal case: customer 1 totals 100 + 25 = 125, customer 2 totals 50
let orders = vec![
OrderRow::new(1, 100, true).expect("valid order"),
OrderRow::new(2, 50, true).expect("valid order"),
OrderRow::new(1, 25, true).expect("valid order"),
OrderRow::new(3, 80, false).expect("valid order"), // Unpaid -> excluded by filter
];
let result = calculate_top_customers(&orders, 2).expect("valid aggregation");
// [
// CustomerTotal { customer_id: 1, total_amount: 125 },
// CustomerTotal { customer_id: 2, total_amount: 50 },
// ]
// 2) Tie-break: both customers have the same total of 100.
// Customer 2 appears first in the input, but when there is a tie, customers are ordered by `customer_id` ascending, so customer 1 comes first.
let tied = vec![
OrderRow::new(2, 100, true).expect("valid order"),
OrderRow::new(1, 100, true).expect("valid order"),
];
let tied_result = calculate_top_customers(&tied, 2).expect("valid aggregation");
// [
// CustomerTotal { customer_id: 1, total_amount: 100 },
// CustomerTotal { customer_id: 2, total_amount: 100 },
// ]
// 3) Limit guard clause: if 0, return an empty result with no computation (negative values cannot be passed in since the type is `usize`)
let empty_result = calculate_top_customers(&orders, 0).expect("valid aggregation");
// []Reading order
Is the limit valid? (guard clause: return an empty result before any computation if limit is 0 or less)
Which rows should be included in the aggregation? (filter)
What is the grouping criterion? (group key)
What should be summed within each group? (sum)
In what order should results be ranked? (sort, with ties broken by ascending
customerId)How many results should be selected? (take)
The important thing is to remember the order. The implementations of filter, group key, sum, sort, and take each differ, but composing them almost always follows that sequence. Note that placing sort after take means you are sorting an arbitrary N items rather than selecting the top N.
Boundary: map accumulation vs. GroupBy chain
When the data is small and readability matters, LINQ's GroupBy chain best expresses the intent. On a hot path where iterations are very frequent or you need to avoid intermediate group-object allocations, accumulating directly into a map as in TypeScript/Rust reduces allocations. Both approaches cover the same five stages, but one favors declarative expression while the other favors minimal allocation. In practice, the chaining style tends to be more comfortable because it is easy to follow in order.
When to use
Anywhere you group, sum, and rank: aggregate reports, leaderboards, dashboard top-N, category totals, and so on. If input arrives as a stream or is too large to fit in memory, switch to a form that maintains only the accumulation map instead of collecting everything into memory.
Antipattern
// The code below is intentionally bad, so it differs from the correct `OrderRow` type shown above.
// It stores `amount` as a float, does not normalize the group key, and queries the database inside a loop.
type BadOrderRow = Readonly<{
customerId: string;
amount: number; // float money bug
isPaid: boolean;
}>;
const totals: Record<string, number> = {};
for (const order of badOrders) {
if (!order.isPaid) {
continue;
}
const customer = db.findCustomer(order.customerId); // I/O inside a hot loop -> N+1 query problem
const key = customer.email; // unnormalized group key
totals[key] = (totals[key] || 0) + order.amount; // floating-point accumulation
}
Three things are wrong at the same time: a DB query inside the loop creates an N+1 query problem, adding amounts as floating-point numbers accumulates rounding errors, and failing to normalize the group key splits the same customer across different keys.
Missing normalization is not obvious when the key is a number (such as customerId), as in the example above. The problem actually surfaces when the key is a string: the same customer ends up under different keys due to trivial formatting differences, causing the aggregation to fragment.
// Cases where the same customer ends up split under different keys
"1" vs " 1 " // Leading and trailing whitespace
"[email protected]" vs "[email protected]" // Letter case
// Normalize keys to a consistent form before accumulation
const key = rawCustomerKey.trim().toLowerCase();
totals.set(key, (totals.get(key) ?? 0n) + order.amountMinor);
The key is to consolidate normalization into one place at the moment the group key is constructed. Finish whitespace trimming, case normalization, and Unicode normalization (NFC, etc.) immediately before accumulation, and let only the normalized key flow through from that point on. If you scatter normalization across multiple places, omitting even one site revives the same bug.
Further considerations
Breaking the aggregation stages into small functions reduces the unit of testing: you can verify filter, accumulation, sort, and take separately. The sort tie-break rule (ascending customerId when totals are equal) is reflected in all four language implementations above. Without locking the tie-break into code, results will vary depending on input order or sort stability, meaning the output order will not be deterministic for the same input.
Summary
Decompose aggregation into five explicit stages: filter, group, sum, sort, and take. Use a map for accumulation, and separate top-N selection as sort followed by slice. Treat amounts as integers or decimals, and normalize group keys before grouping.
Common misuses: DB queries inside a loop, missing group key normalization, treating amounts as float, incorrect sort direction, and missing limit validation.