Interview on Neil's Space

How RocksDB Works: A Minimal LSM-Tree Primer

Sat, 13 Jun 2026 07:00:00 -0700

While prepping for interviews, I spent some time really digging into how RocksDB works — how its storage engine is designed, how data gets written, and how it gets read back. RocksDB (and the LSM-tree underneath it) is one of those things a lot of people have heard of but can’t quite explain — I couldn’t either, before I sat down with it. Once it clicked, I wrote up the core ideas as these notes, to share with anyone else trying to get it.

I won’t claim this is exhaustive or deeply expert, but I hope it leaves you (and future me) with a clear overall picture of how RocksDB actually turns.

What RocksDB is

In one line: an embeddable, persistent key-value store.

Embeddable: it isn’t a standalone server like MySQL — it’s a library you compile directly into your program, which cuts out inter-process communication overhead.
Persistent: data lives on disk; nothing is lost on a crash.
Forked from Google’s LevelDB in 2012, written in C++, optimized specifically for SSDs and write-heavy workloads. Meta, Microsoft, Netflix, and Uber all use it.
It is not distributed — replication and sharding are your job at a higher layer.

The operations it exposes are humble: put(key, value) to write, get(key) to read, delete(key) to remove, merge(key, value) to combine, and iterator.seek() for range scans.

The core idea: the LSM-tree

Everything in RocksDB is built on the LSM-tree (Log-Structured Merge-Tree).

The core tension it tackles: disks hate random writes and love sequential ones. The LSM-tree’s trick is to buffer writes in memory, keep them sorted, then flush them to disk sequentially all at once. In other words, it batches a flood of random writes into sequential writes — and that’s the fundamental reason it writes so fast.

Structurally, data is split across many levels: the top level lives in memory, and below it sit level after level on disk, numbered L0, L1, L2… The deeper you go, the older and larger the data (each level is typically ~10× the one above it).

memory   ┌──────────────────────────────┐
         │  MemTable (writable, sorted)  │  ← new data lands here first
         └──────────────────────────────┘
- - - - - - - - - - - - - - - - - - - - - -  flush
disk     L0   [SST] [SST] [SST]      ← newest; key ranges may overlap across files
         L1   [SST][SST][SST][SST]   ← no overlap within a level, and bigger
         L2   [SST][SST] ......      ← older and larger the deeper you go (~×10)
         ...

This structure dates back to 1996 and was designed for write-intensive workloads. Besides RocksDB, Bigtable, HBase, Cassandra, and MongoDB’s WiredTiger engine are all LSM-tree based.

Writing: how data gets in

A single write lands in two places at once:

put(key, value)
      │
      ├──► WAL       (appended sequentially to disk, for crash safety)
      │
      └──► MemTable  (kept sorted in memory)
                  │  fills up at ~64MB
                  ▼
            turns read-only; a background thread flushes it to one SST file → L0

MemTable: the in-memory write buffer where every insert, update, and delete goes first. It’s kept sorted by key internally (the default implementation is a skip list), which is what makes the later flush and range queries efficient. One detail: a delete doesn’t actually erase anything — it writes a tombstone record meaning “this key is deleted.” The real cleanup is left to compaction later.

WAL (Write-Ahead Log): the MemTable is in memory, so a power loss would wipe it. So every write also appends a record to a WAL file on disk — key, value, operation type, and a checksum. After a crash, RocksDB replays the WAL to reconstruct the MemTable. Note the WAL is appended in write order, not sorted — it’s optimizing purely for speed.

Flush: once a MemTable fills up, it turns read-only and a fresh one takes over; a background thread then flushes the read-only MemTable into a single SST file on L0. Once that’s done, the corresponding WAL can be discarded. Because the MemTable was already sorted, this flush is one sequential write — which is the whole point of the LSM-tree.

What an SST file looks like

An SST (Static Sorted Table) is the file that actually holds data on disk, and it’s never modified once written. Inside is a pile of sorted key-value pairs, laid out in a carefully designed block format (blocks default to 4KB and can be compressed with Snappy, LZ4, ZSTD, etc.).

An SST is roughly split into a few sections:

Data blocks: the sorted key-value pairs. Since adjacent keys are similar, only the differences need to be stored (delta encoding) to save space.
Index: records, for each data block, “last key → offset in the file,” so a lookup can binary-search straight to the right block instead of scanning the whole file.
Bloom filter (optional): a probabilistic structure that very quickly answers “this key is definitely not in this file.” It may give a false “yes,” but never a false “no” — perfect for skipping, on a read, a whole batch of files you don’t need to touch.

Reading: how data gets found

To read a key, you search newest to oldest, level by level — newer values sit higher, older ones lower, so the first hit is the latest value:

Check the active MemTable;
Then the read-only MemTables not yet flushed;
Then each SST file in L0 (L0 files can overlap in key range, so you have to check them one by one, newest to oldest);
From L1 down, each level has non-overlapping key ranges, so you only need to locate and check one file per level.

And within a single SST file, it’s again three steps: first ask the Bloom filter whether the key is present — if not, skip the file entirely; if so, use the index to binary-search to the right data block; finally read that block and find the key inside it.

So the cost of a read comes down to how many levels and files you have to wade through — which leads straight into the next section.

Compaction: the background cleanup that never stops

As noted, a delete just writes a tombstone, and an update just writes a new value on top of the old one. Over time, the disk fills up with stale old versions and tombstones: they waste space and force reads to wade through more files.

Compaction is the background job that cleans this up: it takes some SST files from one level, merges them with the overlapping files in the next level, throws away the shadowed old values and deleted keys, and writes fresh, clean SSTs into the lower level. Since every file is already sorted, the merge uses a k-way merge — a scaled-up version of the “merge” step in merge sort. It all runs on background threads, so it doesn’t block foreground reads and writes.

RocksDB defaults to leveled compaction:

L0 is special: its files may overlap in key range (since they’re flushed straight from MemTables); compaction triggers once the L0 file count hits a threshold (4 by default).
L1 and below: within each level, all files have non-overlapping key ranges and are globally ordered; when a level’s total size exceeds its target, the excess is merged down into the next level — sometimes cascading down several levels in a chain.

It’s all trade-offs: the three amplifications

The key to understanding RocksDB tuning (really, all LSM engines) is three amplification factors:

Space amplification: disk space actually used ÷ size of the logical data. The more stale versions and tombstones pile up, the higher it gets.
Read amplification: how many I/O operations a single logical read actually performs. The more levels and files to wade through, the higher it gets.
Write amplification: how many times a single logical write is actually written. The same piece of data gets rewritten to lower levels over and over during compaction, so this can get large.

These three are a game of whack-a-mole: the more aggressively you compact, the smaller your space and read amplification, but the larger your write amplification — and vice versa. The right balance depends entirely on your workload, and the knobs are many and interdependent. Even the RocksDB authors admit it’s hard to pin down the exact effect of each parameter, and recommend benchmarking a lot while keeping an eye on those three amplification factors.

An aside: the merge operation

Besides put and delete, RocksDB has merge. When you need to apply lots of incremental updates to a value (say, repeatedly appending to a counter or a list), the traditional approach is read-modify-write: read it out, change it, write it back — clunky. merge lets you write just the increment and hands off the combining to a merge function you define, computing the final value only at read or compaction time. The upside is lower write amplification, plus it’s thread-safe; the cost is that reads get more expensive — until the increments are consolidated, every read has to recompute them.

The bits worth remembering

If I keep just one mental map, it’s this:

RocksDB = an embeddable, persistent KV store, descended from LevelDB, built on the LSM-tree;
Writes: into the in-memory MemTable (sorted) + a sequential WAL (crash safety) → once full, flushed to an SST file on L0 → compaction slowly tidies things downward in the background;
Reads: search newest to oldest, level by level, using a Bloom filter + index to skip and locate so you read as few stray files as possible;
The essence: it trades “write amplification” for the high throughput of “turning random writes into sequential ones” — and between space, read, and write amplification, it’s always a trade-off; there’s no free lunch.

Hold onto those few lines and the overall shape of RocksDB stands up. The finer details — skip lists, delta encoding, the various compaction strategies, how to tune the knobs — you can dive into whenever you actually need them.

A lot of my understanding here comes from Artem Krylysov’s How RocksDB Works, which goes into far more depth — highly recommended if you want to go deeper.

Sorting Algorithms for Coding Interviews: A Python Reference from Bubble Sort to Timsort

Sat, 13 Jun 2026 00:00:00 -0700

I’ve been prepping for coding interviews lately, and I went back through the sorting algorithms from scratch. The process gave me a bit of a scare: a lot of this I genuinely used to know — how quicksort’s partition actually works, why it degrades — and now I had to pause to remember it. By the time I got to the non-comparison sorts — counting, radix, bucket — I realized that whole area had become more or less a blank.

So I decided to write this review down. Partly as a reference for other people getting ready for interviews, and partly as a record for my future self: next time I need to interview, I can come back here, skim through, and quickly figure out “this one I still know, this one I forgot, let me focus there.”

How to use this post:

First look at the cheat sheet below — one glance tells you which algorithms you’ve forgotten;
For anything you want to dig into, use the table of contents (TOC) on the right to jump straight there;
Every algorithm follows the same template: one-line idea → Python implementation → complexity → stability and in-place → interview notes, so they’re easy to compare.

All the code is in Python, because it reads closest to pseudocode and makes the logic easiest to see.

One-page cheat sheet

Conclusions first. The table below covers all 11 sorts in this post. When an interviewer asks about complexity or stability, this is the table that should flash into your head.

Algorithm	Best	Average	Worst	Space	Stable	In-place
Bubble	O(n)	O(n²)	O(n²)	O(1)	✅	✅
Selection	O(n²)	O(n²)	O(n²)	O(1)	❌	✅
Insertion	O(n)	O(n²)	O(n²)	O(1)	✅	✅
Shell	O(n log n)	≈O(n^1.3)	O(n²)	O(1)	❌	✅
Merge	O(n log n)	O(n log n)	O(n log n)	O(n)	✅	❌
Quick	O(n log n)	O(n log n)	O(n²)	O(log n)	❌	✅
Heap	O(n log n)	O(n log n)	O(n log n)	O(1)	❌	✅
Counting	O(n+k)	O(n+k)	O(n+k)	O(n+k)	✅	❌
Radix	O(d·(n+k))	O(d·(n+k))	O(d·(n+k))	O(n+k)	✅	❌
Bucket	O(n+k)	O(n+k)	O(n²)	O(n+k)	✅*	❌
Timsort	O(n)	O(n log n)	O(n log n)	O(n)	✅	❌

A few notes so the table doesn’t mislead you:

Shell sort’s complexity depends on the gap sequence; its best case changes with the sequence you pick, so the numbers here are just typical orders of magnitude.
Quick sort’s listed space is the average recursion-stack depth O(log n); the worst case degrades to O(n). It partitions in place, but the recursion itself uses the stack.
Bucket sort’s stability has an asterisk: it’s only stable if the per-bucket sort (e.g. insertion sort) is stable.
k is the range of values, d is the number of digits — the complexity of the non-comparison sorts is always tied to properties of the data itself, which I’ll get into below.

Before we start: a few unavoidable concepts

Before going through the algorithms one by one, there are four concepts that nearly every sorting interview question relies on. Getting them straight first means I won’t have to keep re-explaining them.

Comparison vs. non-comparison sorts

A comparison sort decides order using only one operation: “which of these two elements is bigger?” Bubble, insertion, merge, quick, and heap are all comparison sorts. What they share: their theoretical lower bound is O(n log n) — nothing can beat it (the reason is below).

A non-comparison sort doesn’t compare; instead it uses the element values themselves to compute where each one belongs. Counting, radix, and bucket are all like this. Because they sidestep comparison, they can hit linear time O(n) — but the price is extra requirements on the data (e.g. it must be integers in a bounded range).

Stability

If two elements have equal sort keys, and their relative order is preserved after sorting, the sort is stable.

A concrete example. Say you have a batch of orders already sorted by time, and now you want to re-sort by amount:

Before (sorted by time):  ($100, 9:00)  ($50, 9:01)  ($100, 9:02)
Stable sort (by amount):  ($50, 9:01)  ($100, 9:00)  ($100, 9:02)   ← the two $100s keep their time order
Unstable sort:            ($50, 9:01)  ($100, 9:02)  ($100, 9:00)   ← the two $100s got scrambled

Why do interviewers love this? Because multi-key sorting depends on it: sort by the secondary key first, then use a stable sort on the primary key, and the secondary order is preserved. Knowing which sorts are stable (bubble, insertion, merge, counting, radix, Timsort) and which aren’t (selection, shell, quick, heap) is almost guaranteed to come up.

In-place sorting

If a sort needs only O(1) or O(log n) extra space, it’s in-place. Merge sort allocates an extra O(n) array, so it isn’t in-place; quick and heap only shuffle the original array, so they are. When an interviewer presses “what if memory is tight?”, this is usually what they’re asking about.

Why complexity splits into best / average / worst

The same algorithm can behave wildly differently on different inputs. Quicksort is the classic case: O(n log n) on random input, but if the input is already sorted and you keep picking the worst pivot, it degrades to O(n²). When you state complexity in an interview, it’s best to say which case you mean — that’s exactly where you show how deeply you understand it.

Why can’t comparison sorts beat O(n log n)? Any comparison sort can be drawn as a decision tree: each internal node is one comparison, each leaf is one possible final arrangement. There are n! possible arrangements of n elements, so the tree must have at least n! leaves. A binary tree of height h has at most 2ʰ leaves, so 2ʰ ≥ n!, i.e. h ≥ log₂(n!). By Stirling’s approximation, log₂(n!) ≈ n log n. The tree’s height is the number of comparisons in the worst case, so the lower bound is Ω(n log n). This also explains why going faster means dropping “comparison” entirely — which is what the non-comparison sorts do.

OK, concepts done. Let’s go through them one by one.

Comparison-based sorts

Bubble Sort

One-line idea: compare adjacent elements pairwise, swap if out of order; each pass “bubbles” the current largest element to the end.

def bubble_sort(arr):
    n = len(arr)
    for i in range(n - 1):
        swapped = False
        # each pass bubbles the largest of the unsorted region to the right end
        for j in range(n - 1 - i):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                swapped = True
        if not swapped:          # a whole pass with no swaps means it's sorted; quit early
            break
    return arr

Complexity: worst and average are both O(n²); with the swapped early-exit, it’s O(n) on already-sorted input. Space O(1).
Stability / in-place: stable (only swaps on a strict greater-than), in-place.
Interview notes: basically never used in practice, but it’s the textbook example of “stable + early-exit reaches O(n)”. Watch out for that swapped optimization — it’s a common gotcha.
LeetCode: 912. Sort an Array — there’s no problem dedicated to bubble sort, but this generic sorting problem is a fine sandbox to practice the implementation (pure O(n²) will time out on large inputs, so it’s practice only).

Selection Sort

One-line idea: each pass picks the smallest element from the unsorted region and places it at the end of the sorted region.

def selection_sort(arr):
    n = len(arr)
    for i in range(n - 1):
        min_idx = i
        # find the index of the minimum in the unsorted region [i+1, n)
        for j in range(i + 1, n):
            if arr[j] < arr[min_idx]:
                min_idx = j
        arr[i], arr[min_idx] = arr[min_idx], arr[i]
    return arr

Complexity: O(n²) no matter what the input looks like — it never speeds up on sorted data. Space O(1).
Stability / in-place: unstable, in-place. For example [5a, 5b, 2]: the first pass swaps 2 with 5a, and the two 5s flip relative order.
Interview notes: its one redeeming trait is the minimum number of swaps (at most n−1), which matters when writes are expensive. It’s also the counterexample to “best case can save you” — it never does — and is often compared against insertion sort.
LeetCode: 912. Sort an Array — practice the implementation on this generic problem; selection sort is a good way to feel “few swaps, but no fewer comparisons.”

Insertion Sort

One-line idea: like sorting a hand of cards — go left to right, inserting each new card into its correct spot among the already-sorted cards on the left.

def insertion_sort(arr):
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        # shift everything bigger than key one slot right to make room
        while j >= 0 and arr[j] > key:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key
    return arr

Complexity: worst and average O(n²); close to O(n) on nearly-sorted input. Space O(1).
Stability / in-place: stable (the while condition uses >, not >=), in-place.
Interview notes: don’t underestimate it. On small or nearly-sorted data, insertion sort beats quicksort, which is exactly why production-grade sorts like Timsort and Introsort fall back to it on small chunks. Of the three basic sorts, it’s the most practically useful.
LeetCode: 147. Insertion Sort List — a problem built for insertion sort: insert in place on a linked list.

Shell Sort

One-line idea: an upgraded insertion sort. First do insertion sort on elements spaced by a large “gap”, then shrink the gap step by step; the last pass uses gap 1 (plain insertion sort), but by then the array is “mostly sorted”, so it flies.

def shell_sort(arr):
    n = len(arr)
    gap = n // 2
    while gap > 0:
        # insertion sort on each subsequence with stride gap
        for i in range(gap, n):
            key = arr[i]
            j = i - gap
            while j >= 0 and arr[j] > key:
                arr[j + gap] = arr[j]
                j -= gap
            arr[j + gap] = key
        gap //= 2
    return arr

Complexity: depends on the gap sequence. The n//2 halving sequence above is O(n²) in the worst case; better sequences (Knuth’s 3k+1, Sedgewick’s) reach O(n^1.5) or better. Space O(1).
Stability / in-place: unstable (gapped swaps scramble the relative order of equal elements), in-place.
Interview notes: it’s the poster child for “making data roughly sorted first lets insertion sort go faster.” Rarely asked directly, but worth knowing as the bridge — it pushes a simple O(n²) sort toward O(n log n).
LeetCode: 912. Sort an Array — use it to practice shell sort and experiment with how different gap sequences affect runtime.

Merge Sort

One-line idea: divide and conquer. Split the array in half until you can’t split further, then merge two already-sorted small arrays into one larger sorted array.

def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)


def merge(left, right):
    result = []
    i = j = 0
    # two pointers, take the smaller of the two each time; <= keeps it stable
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])   # whatever's left just gets appended
    result.extend(right[j:])
    return result

Complexity: best, average, and worst are all O(n log n) — rock solid, never degrades. Space O(n) (the merge needs an extra array).
Stability / in-place: stable, not in-place.
Interview notes: bulletproof complexity and naturally stable — the first choice when you need “stable + guaranteed O(n log n) worst case.”
LeetCode: 148. Sort List — the optimal solution for sorting a linked list is merge sort; for the array version use 912. Sort an Array.

Two high-frequency extensions:

Concept break: linked-list sorting and external sorting

Linked-list sorting: merge sort is especially friendly to linked lists — merging only rewires pointers, no extra array needed, so it can achieve O(1) extra space (not counting the recursion stack). This is why the standard answer to “sort a linked list in O(n log n)” is merge sort, not quicksort.

External sorting: when the data is too big to fit in memory (the classic interview question: “how do you sort a 10 GB file with 1 GB of memory?”), the answer is external merge sort — split the big file into chunks small enough to fit in memory, read each in, sort it, write it back to disk, then use a k-way merge to combine those sorted files into the final result. Merge’s essence — “combining multiple sorted sequences” — is taken to the extreme here.

Quick Sort

This is the section I most needed to pick back up — the partition details got fuzzy after five years untouched. Let’s take it slow.

One-line idea: divide and conquer. Pick a pivot, partition the array into “less than the pivot” and “greater than the pivot”, put the pivot in its final place, then recurse on both sides.

1. Lomuto partition (the easiest to memorize)

def quick_sort(arr, low=0, high=None):
    if high is None:
        high = len(arr) - 1
    if low < high:
        p = partition(arr, low, high)
        quick_sort(arr, low, p - 1)    # recurse left half
        quick_sort(arr, p + 1, high)   # recurse right half
    return arr


def partition(arr, low, high):
    pivot = arr[high]            # Lomuto: always take the rightmost element as pivot
    i = low - 1                  # i is the right boundary of the "less than pivot" region
    for j in range(low, high):
        if arr[j] < pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]
    arr[i + 1], arr[high] = arr[high], arr[i + 1]   # put the pivot in place
    return i + 1

Lomuto’s advantage is that it advances a single pointer i, so the logic is intuitive and easy to remember. For hand-writing quicksort in an interview, this version is the default.

2. Why it degrades, and how to fix it

Always taking the rightmost element as pivot has a fatal flaw: when the input is already sorted (or reverse-sorted), every partition splits the array into sizes 0 and n−1, the recursion depth becomes n, complexity degrades to O(n²), and it can blow the stack.

The fix is to stop letting the input “predict” the pivot — pick one at random, or take the median of the first, middle, and last elements (median-of-three):

import random

def partition(arr, low, high):
    rand = random.randint(low, high)
    arr[rand], arr[high] = arr[high], arr[rand]   # random pivot, swap it to the right, reuse the logic above
    pivot = arr[high]
    i = low - 1
    for j in range(low, high):
        if arr[j] < pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]
    arr[i + 1], arr[high] = arr[high], arr[i + 1]
    return i + 1

Two added lines plug the most common pitfall — degrading on sorted input. When the interviewer presses “what about quicksort’s worst case”, this is the standard answer.

3. Three-way quicksort: handling lots of duplicates

If the array has many duplicate values (say, all 0s and 1s), plain quicksort still does a lot of pointless recursion. Three-way quicksort (based on the “Dutch national flag problem”) splits the array into < pivot, == pivot, and > pivot, and skips the whole equal-to-pivot segment:

def quick_sort_3way(arr, low=0, high=None):
    if high is None:
        high = len(arr) - 1
    if low >= high:
        return arr
    pivot = arr[low]
    lt, i, gt = low, low, high   # [low,lt)pivot
    while i <= gt:
        if arr[i] < pivot:
            arr[lt], arr[i] = arr[i], arr[lt]
            lt += 1
            i += 1
        elif arr[i] > pivot:
            arr[gt], arr[i] = arr[i], arr[gt]
            gt -= 1               # the swapped-in element isn't checked yet, so i stays
        else:
            i += 1
    quick_sort_3way(arr, low, lt - 1)
    quick_sort_3way(arr, gt + 1, high)
    return arr

Complexity: average O(n log n), worst O(n²) (almost never seen once you use a random pivot). Space O(log n), for the recursion stack.
Stability / in-place: unstable (the long-distance swaps in partitioning scramble equal elements), in-place.
Interview notes: default to Lomuto when hand-writing; bring up random / median-of-three when asked about the worst case; bring up three-way quicksort when asked about lots of duplicates.
LeetCode: 912. Sort an Array — remember to use a random pivot on submission, or sorted / heavily-duplicated data will time out or overflow the recursion stack.

One more high-frequency extension:

Concept break: Quickselect

“Find the k-th largest / smallest element” is an interview regular. If you only need the k-th one, there’s no need to fully sort: use quicksort’s partition, and after each partition look at where the pivot landed — then recurse into only the side that contains k. Average O(n), faster than sorting first and then indexing (O(n log n)).
def quickselect(arr, k):
    """return the k-th smallest element, k counting from 1"""
    low, high, target = 0, len(arr) - 1, k - 1
    while low <= high:
        p = partition(arr, low, high)
        if p == target:
            return arr[p]
        elif p < target:
            low = p + 1        # target is on the right
        else:
            high = p - 1       # target is on the left
Practice: 215. Kth Largest Element in an Array — solve it with quickselect at average O(n), a nice contrast to the heap solution.

Heap Sort

One-line idea: first build the array into a max-heap (every parent ≥ its children), so the root is the maximum; swap the root to the end, shrink the heap by one, sift the new root down, and repeat until sorted.

def heap_sort(arr):
    n = len(arr)
    # 1. build the heap: starting from the last non-leaf node, sift each one down
    for i in range(n // 2 - 1, -1, -1):
        sift_down(arr, i, n)
    # 2. repeatedly swap the root (max) to the end, then fix the remaining heap
    for end in range(n - 1, 0, -1):
        arr[0], arr[end] = arr[end], arr[0]
        sift_down(arr, 0, end)
    return arr


def sift_down(arr, root, size):
    while True:
        largest = root
        left, right = 2 * root + 1, 2 * root + 2
        if left < size and arr[left] > arr[largest]:
            largest = left
        if right < size and arr[right] > arr[largest]:
            largest = right
        if largest == root:      # the parent is already the largest; stop sinking
            break
        arr[root], arr[largest] = arr[largest], arr[root]
        root = largest

Complexity: best, average, and worst are all O(n log n). Building the heap is O(n) (not O(n log n) — a commonly-tested counterintuitive point), then n sift-downs of O(log n) each. Space O(1).
Stability / in-place: unstable, in-place.
Interview notes: it’s the only sort that both guarantees worst-case O(n log n) and uses only O(1) space — pick it when memory is extremely tight and you can’t afford to degrade. Note that it’s the same machinery as a priority queue / heap: heapq, Top-K problems, the heap inside Dijkstra — all variants of this sift_down. Maintaining a size-k min-heap to find the Top-K is a chained follow-up in this area.
LeetCode: 215. Kth Largest Element in an Array — the classic heap problem (maintain a size-k min-heap); it can also be solved with quickselect, a nice way to contrast the two approaches.

Non-comparison-based sorts

Every algorithm so far relies on “comparison”, which is why they’re stuck at the O(n log n) line. The next three sidestep comparison, using the element values themselves as indices to place items — which lets them hit linear time. The price is requirements on the data. This is also the area where my own memory was blankest, so I’ll go into a bit more detail.

Counting Sort

One-line idea: count how many times each value occurs, then use a prefix sum to compute each value’s position in the result and drop it straight in. Good for integers with a small value range k.

def counting_sort(arr):
    if not arr:
        return arr
    lo, hi = min(arr), max(arr)
    k = hi - lo + 1
    count = [0] * k
    for x in arr:                # 1. count
        count[x - lo] += 1
    for i in range(1, k):        # 2. prefix sum: count[i] becomes "number of elements <= i"
        count[i] += count[i - 1]
    result = [0] * len(arr)
    for x in reversed(arr):      # 3. fill back-to-front to stay stable
        count[x - lo] -= 1
        result[count[x - lo]] = x
    return result

Complexity: O(n + k), where n is the element count and k is the value range. Space O(n + k).
Stability / in-place: stable (the key is iterating back-to-front in step 3), not in-place.
Interview notes: when k is far smaller than n (e.g. sorting a hundred thousand scores in 0–100), it crushes any O(n log n) sort. But once k is large (e.g. sorting arbitrary 32-bit integers), the space blows up — that’s exactly its limit, and the problem radix sort exists to solve.
LeetCode: 75. Sort Colors — only three values (0, 1, 2), so counting sort (or three-way quicksort) handles it in one pass.

Radix Sort

One-line idea: sort digit by digit. Starting from the least significant digit (ones place), run one stable counting sort per digit, all the way up to the most significant digit. Because each pass is stable, the whole thing is sorted once you finish the highest digit.

def radix_sort(arr):
    if not arr:
        return arr
    max_val = max(arr)
    exp = 1                           # current digit: 1=ones, 10=tens, 100=hundreds...
    while max_val // exp > 0:
        arr = counting_sort_by_digit(arr, exp)
        exp *= 10
    return arr


def counting_sort_by_digit(arr, exp):
    count = [0] * 10                  # base 10, each digit is only 0-9
    for x in arr:
        count[(x // exp) % 10] += 1
    for i in range(1, 10):
        count[i] += count[i - 1]
    result = [0] * len(arr)
    for x in reversed(arr):           # back-to-front to keep this digit's sort stable (key to radix sort's correctness)
        digit = (x // exp) % 10
        count[digit] -= 1
        result[count[digit]] = x
    return result

Complexity: O(d·(n + k)), where d is the number of digits in the largest value and k is the base (here, base 10, so k=10). Space O(n + k).
Stability / in-place: stable, not in-place.
Interview notes: it solves counting sort’s “space blows up when the range is large” problem — by breaking a big integer into a few small digits. The version above only handles non-negative integers; to support negatives, shift everything to be non-negative first, or handle positives and negatives separately. Common interview questions: why must you go from low digit to high digit? Why must each digit’s sort be stable? (Because sorting a higher digit relies on stability to preserve the order already established by the lower digits.)
LeetCode: 164. Maximum Gap — it demands linear time and space, and the standard solution is exactly radix sort or bucket sort.

Bucket Sort

One-line idea: distribute the data evenly into a number of “buckets” by value, sort each bucket internally, then concatenate the buckets in order. Good for uniformly distributed data.

def bucket_sort(arr):
    if not arr:
        return arr
    n = len(arr)
    buckets = [[] for _ in range(n)]
    for x in arr:                     # assume elements are uniformly distributed in [0, 1)
        buckets[int(n * x)].append(x)
    result = []
    for bucket in buckets:
        insertion_sort(bucket)        # stable sort within buckets keeps the whole thing stable
        result.extend(bucket)
    return result

Complexity: average O(n + k) when the data is uniformly distributed; the worst case (all elements crammed into one bucket) degrades to O(n²). Space O(n + k).
Stability / in-place: depends on the per-bucket sort — stable if you use insertion sort; not in-place.
Interview notes: its performance rides entirely on “is the data uniformly distributed”, which is the biggest difference from counting and radix. Counting and radix are insensitive to the shape of the data; bucket sort is sensitive to it. Classic use case: sorting a batch of floats uniformly distributed in [0, 1).
LeetCode: 347. Top K Frequent Elements — bucketing by frequency is the slickest solution to this one.

What gets used in the real world: Timsort

Everything above is a “textbook algorithm”. But every time you call sorted(), what Python actually runs underneath is Timsort — a hybrid carefully tuned for real-world data. It’s worth a section of its own, because being able to bring it up in an interview is often a plus.

Core idea: real data is rarely fully random — it’s often partially sorted already. Timsort seizes on this:

First scan the array for naturally-sorted contiguous segments, called runs;
Pad runs that are too short up to a minimum length (minrun, usually 32–64) using insertion sort — as noted earlier, insertion sort is fastest on small arrays;
Then merge these runs pairwise following a set of rules, with a “galloping” mode to speed up the merges.

So Timsort = the skeleton of merge sort + the small-chunk optimization of insertion sort + special-casing for already-sorted data.

Complexity: worst O(n log n), but down to O(n) on nearly-sorted data. Space O(n).
Stability: stable. This is why Python’s sorted() and list.sort() are guaranteed stable.
Trivia: Java’s object sort (Arrays.sort(Object[])) also uses a Timsort variant; while C++’s std::sort uses a different hybrid, Introsort (quicksort as the base, switching to heap sort when recursion gets too deep to avoid degrading, and insertion sort on small chunks). “Quick + heap + insertion” rolled into one — the same spirit as Timsort: there’s no silver bullet; production-grade sorts are all hybrids.
LeetCode: 56. Merge Intervals — sort first, then sweep and merge; in Python that sorted() call is running Timsort, so it’s a good way to feel the speedup from “real data is partially sorted”.

How to actually choose / how to answer in interviews

Here’s everything above boiled down to a “which one should I use” checklist:

No special requirements, just want speed → quicksort (random pivot). The default for most situations.
Need stable, and the worst case must stay O(n log n) → merge sort.
Memory is extremely tight (need O(1) space) and can’t degrade → heap sort.
Very small data (a few dozen) or nearly sorted → insertion sort.
Sorting a linked list → merge sort.
Integers with a small value range → counting sort.
Integers but a huge range (e.g. fixed-length integers / strings) → radix sort.
Data uniformly distributed over an interval → bucket sort.
Only need the k-th largest / the median, not a full sort → quickselect.
Data too big to fit in memory → external merge sort.

A few common chained follow-ups — have the answers ready:

“Which sorts are stable?” → bubble, insertion, merge, counting, radix, bucket (when the per-bucket sort is stable), Timsort.
“Quicksort’s worst case and how to avoid it?” → sorted input + a bad pivot degrades to O(n²); use a random pivot or median-of-three.
“Can you beat O(n log n)?” → comparison sorts can’t (the decision-tree lower bound); but if the data is integers in a bounded range, non-comparison sorts get you to O(n).
“Is there a sort that’s O(n log n), stable, and in-place?” → not in typical implementations; merge is stable but not in-place, heap is in-place but not stable, quick is in-place but not stable. A great prompt for testing whether you understand the trade-offs.

A few common mistakes

Pitfalls I stepped in (or nearly did) while reviewing:

Selection sort doesn’t speed up on sorted input — it has no early-exit, so it’s always O(n²). Don’t confuse it with bubble / insertion.
Building a heap is O(n), not O(n log n). Intuitively it looks like n elements at O(log n) each, but a careful count (most nodes are near the bottom) gives O(n). A common counterintuitive question.
Counting / radix sort fill the result back-to-front — that step is the source of stability; do it the other way and it’s unstable, and an unstable radix sort is simply wrong.
Quicksort’s space isn’t O(1) — it partitions in place, but the recursion stack is O(log n) on average and O(n) at worst.
Bucket sort’s worst case is O(n²) — don’t only remember the O(n) average; it degrades when the distribution is skewed.
Stable ≠ in-place — these are two independent dimensions, often asked together in interviews. Don’t conflate them.

Wrapping up

The framework for this topic is actually pretty clear:

Comparison sorts are stuck at O(n log n); among them quicksort is fastest but degrades, merge is stable but space-hungry, heap is in-place but unstable — there’s no all-rounder, it’s all trade-offs;
Non-comparison sorts trade requirements on the data for linear time, but they make demands on that data;
Production-grade sorts (Timsort, Introsort) are all hybrids, stitching together the strengths of several algorithms.

If, like me, you’re picking this back up after a few years, I’d suggest running through that cheat sheet at the top: skip anything you can implement from memory and explain the complexity and stability of, and go back to the relevant section for whatever you stumble on. Next time you’re prepping for interviews, just come back and skim it again.

Good luck with your interviews.