~25 phútIndexing internals… lượt xem

B-tree internals — page split, fill factor, height log branching

B-tree không phải binary tree. Multi-way balanced (~400 key/page). Height = log_400(N). Page split cost. Fill factor 90 default. UUIDv4 amplification 2-3x write.

B-tree là index mặc định ở mọi database từ PostgreSQL đến MySQL đến SQLite. Tên "B-tree" thường bị nhầm với binary tree — 2 nhánh, height log₂(N). Thực tế hoàn toàn khác: B-tree multi-way balanced — mỗi node chứa hàng trăm key (PostgreSQL: khoảng 400 key per 8 KB page). Hệ quả: B-tree với 1 tỷ row chỉ cao 4–5 cấp, mỗi lookup chỉ cần 4–5 disk I/O.

Bài này mở B-tree ra xem từng layer: page format, page split lifecycle, fill factor, height calculation, và demo write amplification khi insert UUID v4 random — lý do tại sao M04.5 của khoá này khuyến nghị UUID v7 thay UUID v4 cho primary key.

1. Analogy — Mục lục sách nhiều tầng

Hãy hình dung một cuốn sách kỹ thuật 10 triệu trang. Mục lục cuối sách có 100 nghìn entry — quá lớn để lật tìm nhanh. Giải pháp: làm thêm "mục lục của mục lục" (mục lục tầng 2), và nếu cần, "mục lục của mục lục của mục lục" (tầng 3). Mỗi trang mục lục chứa được khoảng 400 entry.

B-tree chính là cấu trúc mục lục nhiều tầng tự cân bằng này. Mỗi page (trang mục lục) chứa khoảng 400 key. Leaf page (tầng cuối) trỏ trực tiếp đến data row. Internal page (tầng trên) trỏ đến page con.

Mục lục sách	B-tree
Trang mục lục	Page (8 KB mặc định trong PG)
Entry trong trang mục lục	Key + pointer trong page
Trang mục lục cuối	Leaf page (trỏ heap row)
Trang mục lục trung gian	Internal page (trỏ child page)
Số tầng mục lục	B-tree height
400 entry/trang	Branching factor ~400 trong PG
Tách trang khi đầy	Page split khi insert vào full leaf

💡 Cách nhớ

B-tree height = số lần tra mục lục để tìm đến trang cần. Branching factor cao (400) → ít tầng → ít tra → nhanh. Đây là lý do B-tree thống trị index cho OLTP trong 50 năm qua.

2. B-tree là gì — multi-way, không phải binary

Binary Search Tree (BST) có branching factor 2 — mỗi node 2 nhánh, height log₂(N). Với 1 triệu row: height = 20. Với 1 tỷ row: height = 30. Mỗi cấp = 1 disk I/O → 30 I/O per lookup → quá chậm.

B-tree PostgreSQL có branching factor khoảng 400:

Binary tree (BST):
         50
        /  \
       25   75
      / \   / \
    12  37 62  87
 Height = log2(N), 1M row ~ 20 cap, 1B row ~ 30 cap

B-tree PG (branching factor ~400):
              [Root: 400 keys, 401 pointers]
            /     |     |   ...   |
        [L1 internal pages, moi page 400 keys]
       /   |   |   ...   |
    [L2 leaf pages: key + CTID pointer -> heap row]
 Height = log_400(N), 1M row ~ 3 cap, 1B row ~ 5 cap

Tại sao branching factor ~400? Mỗi page 8 KB, mỗi entry gồm key (~16 byte trung bình) và pointer (6 byte CTID): 8192 / (16 + 6) ≈ 372. Tính tròn cộng overhead header: ~400 entry per page.

Row count	Height	Page reads/lookup
100	1	1
100 nghìn	2	2
100 triệu	4	4
1 tỷ	4–5	4–5
100 tỷ	5–6	5–6

5 page reads × 10 μs (warm cache SSD) = 50 μs per lookup cho 1 tỷ row. Đây là lý do B-tree đạt sub-millisecond với dataset khổng lồ.

3. PG B-tree page format

Mỗi page 8 KB có cấu trúc:

Page 8 KB layout (simplified):
+----------+------------+------------+--- ... ---+-----------+
| Header   | ItemPtr 1  | ItemPtr 2  | ItemPtr N | Free      |
| 24 byte  | (offset,   | (offset,   |           | space     |
|          |  flags)    |  flags)    |           |           |
+----------+------------+----------- +--- ... ---+-----------+
|           <-- items grow left from end of page -->         |
| Item N: key (variable, avg 16 byte) + ptr (6 byte CTID)   |
+------------------------------------------------------------+

Header chứa: LSN (Log Sequence Number cho WAL), checksum, liên kết sang page trái/phải (linked list cho range scan), và flags (leaf/internal).

Hai loại page:

Leaf page: pointer trỏ heap row qua CTID = (page_number, offset). Range scan traverse linked list leaf → không cần đi lên internal node.
Internal page: pointer trỏ child page. Không chứa heap pointer.

Introspect trực tiếp bằng extension pageinspect:

-- Bat extension (su dung trong dev/staging, khong production raw)
CREATE EXTENSION IF NOT EXISTS pageinspect;

-- Meta page: root, height (level), fastroot
SELECT * FROM bt_metap('tasks_pkey');
-- root | level | fastroot | fastlevel
--    3 |     2 |        3 |         2

-- Items trong leaf page so 1
SELECT itemoffset, ctid, itemlen, data
FROM bt_page_items('tasks_pkey', 1)
LIMIT 5;
-- itemoffset | ctid    | itemlen | data
--          1 | (0,1)   |      16 | 01 00 ...
--          2 | (0,2)   |      16 | 02 00 ...

Forward: Module 11 của khoá này đi sâu pageinspect cho production debugging — index bloat detection, fragmentation analysis.

4. Page split — vì sao insert random gây write amplification

Insert lifecycle B-tree, bước theo bước:

Traverse từ root xuống leaf — đi theo key range.
Nếu leaf còn free space → insert đúng vị trí sorted → ghi 1 page.
Nếu leaf đầy → page split:
- Cấp phát page mới từ freelist.
- Di chuyển 50% entry sang page mới.
- Update linked list (prev/next pointer).
- Đẩy mid-key lên internal node (parent pointer mới).
- Nếu internal node cũng đầy → split tiếp lên, có thể đến root.
- Khi root split → cấp phát root mới → height tăng 1.

Leaf day, insert key 6:
Truoc:  [Page A: 1, 5,  7,  9, 12]  -- full

Sau split:
        [Page A: 1, 5,  6]  <-->  [Page A': 7, 9, 12]
         ^--- giu page cu          ^--- page moi

Internal node: them pointer den Page A'
Write amplification: 2 page write (A + A') thay vi 1

Sequential insert (BIGSERIAL): key mới luôn lớn hơn mọi key hiện có → insert vào leaf cuối → leaf cuối thường còn chỗ (fill factor chưa đầy) → ghi 1 page. Khi leaf cuối đầy → split 1 lần → page mới rỗng tiếp tục nhận sequential insert. Amortized ~1 write per insert sau steady state.

Random insert (UUID v4): key mới rơi ngẫu nhiên khắp B-tree → xác suất rơi vào leaf đã đầy cao → split thường xuyên. 2–3 write per insert trung bình.

Demo benchmark:

-- Setup hai bang
CREATE TABLE bench_serial (
  id      BIGSERIAL PRIMARY KEY,
  payload TEXT
);

CREATE TABLE bench_uuid (
  id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  payload TEXT
);

-- Insert 1 trieu row vao moi bang
\timing on

INSERT INTO bench_serial(payload)
SELECT repeat('x', 100)
FROM generate_series(1, 1000000);
-- ~5 giay

INSERT INTO bench_uuid(payload)
SELECT repeat('x', 100)
FROM generate_series(1, 1000000);
-- ~12 giay (2.4x cham hon) -- page split overhead

-- So sanh index size
SELECT pg_size_pretty(pg_relation_size('bench_serial_pkey'));
-- ~22 MB

SELECT pg_size_pretty(pg_relation_size('bench_uuid_pkey'));
-- ~50 MB (2.3x lon hon) -- bloat tu split + dead space

Pattern rõ: BIGSERIAL hoặc UUID v7 (sequential) cho primary key B-tree friendly. UUID v4 → write amplification 2–3x và index bloat đáng kể.

5. Fill factor — để lại chỗ trống cho insert tương lai

Fill factor kiểm soát mức tối đa mỗi leaf page được lấp đầy trước khi PG mở page mới. Mặc định B-tree trong PG: 90% — mỗi leaf page giữ 10% free space.

-- Mac dinh: fillfactor=90 (leaf page day toi 90% moi mo page moi)
CREATE INDEX idx_tasks_due ON tasks(due_at);

-- Read-only hoac append-only: fillfactor=100 (tiet kiem storage)
CREATE INDEX idx_archive_due ON archive_tasks(due_at)
WITH (fillfactor=100);

-- Heavy UPDATE existing row (HOT update): fillfactor=80
-- 20% free space -> HOT update co the ghi tren cung page -> tranh index update
CREATE INDEX idx_tasks_status ON tasks(status)
WITH (fillfactor=80);

Fill factor là tradeoff storage vs split frequency:

Fill factor	Storage	Page split	Dùng khi
100	Tối ưu nhất	Sớm nhất	Archive / read-only table
90 (default)	Tốt	Ít	OLTP thông thường
80–85	Tốn 15–20%	Hiếm hơn	Table có nhiều UPDATE tại chỗ

Forward: Module 6 của khoá này giải thích HOT update (Heap-Only Tuple) — cơ chế PG update row mà không cần update index nếu indexed column không thay đổi và cùng page còn chỗ. Fill factor thấp làm HOT update thành công thường xuyên hơn.

6. Height calculation — vì sao 4–5 cấp đủ cho 1 tỷ row

Formula đơn giản: height = ceil(log_b(N)) với b = branching factor, N = số row.

PostgreSQL branching factor ≈ 400 (thực tế 340–450 tùy key size):

N = 1,000,000,000 (1 ty)
height = ceil(log_400(1,000,000,000))
       = ceil(log(1e9) / log(400))
       = ceil(9 / 2.602)
       = ceil(3.46)
       = 4 cap

Mỗi page read: ~10 μs từ RAM (buffer cache hit) hoặc ~100 μs từ SSD (cache miss). 4 cấp = 4 page reads = 40–400 μs per lookup → hoàn toàn đạt sub-millisecond SLA.

Kiểm tra height thực tế qua pageinspect:

-- Check height cua index bat ky
SELECT
  i.relname        AS index_name,
  bt.level         AS height,
  bt.root          AS root_page,
  bt.fastroot      AS fast_root_page
FROM pg_indexes
JOIN pg_class i ON i.relname = pg_indexes.indexname
CROSS JOIN bt_metap(pg_indexes.indexname) AS bt
WHERE pg_indexes.tablename = 'tasks';

-- Hoac don gian hon:
SELECT level FROM bt_metap('tasks_pkey');
-- level | ...
--     3 |    (height 3 = 4 cap tinh ca root: root, L1, L2 leaf)

Height tăng thêm 1 chỉ khi root bị split — rất hiếm xảy ra, và khi xảy ra PG handle hoàn toàn tự động.

7. Bloat — DELETE không free space ngay

Khi DELETE một row, PG không lập tức xoá entry khỏi B-tree leaf page. Entry được đánh dấu "dead" (deleted) nhưng vẫn chiếm chỗ. VACUUM phải chạy sau mới reclaim space.

-- Delete 500k row
DELETE FROM bench_serial WHERE id % 2 = 0;

-- Index size chua thay doi ngay
SELECT pg_size_pretty(pg_relation_size('bench_serial_pkey'));
-- Van ~22 MB (dead entry con chiem cho)

-- VACUUM reclaim dead entry
VACUUM bench_serial;

-- Index size co the giam (neu autovacuum chua kip)
SELECT pg_size_pretty(pg_relation_size('bench_serial_pkey'));
-- ~11 MB (sau VACUUM)

Vì sao không reclaim ngay? MVCC (Module 6 của khoá này): một transaction đang chạy song song có thể vẫn cần nhìn thấy row đã DELETE theo snapshot của nó. PG chỉ có thể reclaim sau khi mọi transaction cần snapshot đó đã kết thúc — VACUUM xác định thời điểm an toàn đó.

Hậu quả bloat khi DELETE nhiều:

Index lớn hơn cần thiết → dùng nhiều RAM buffer cache.
Range scan touch nhiều page hơn → chậm hơn.
Tích lũy theo thời gian nếu autovacuum không đuổi kịp write load.

Detection và xử lý:

-- pgstattuple: ty le dead tuple trong index
-- (can CREATE EXTENSION pgstattuple)
SELECT * FROM pgstatindex('tasks_pkey');
-- leaf_pages | dead_leaf_pages | avg_leaf_density | ...

-- REINDEX CONCURRENTLY: rebuild khong lock bang (PG 12+)
REINDEX INDEX CONCURRENTLY tasks_pkey;

-- pg_repack: zero-downtime rebuild ca table + index
-- (external tool, khong built-in)

Forward: Module 11 của khoá này đi sâu bloat monitoring (pgstattuple, pg_repack) và production strategy cho reindex không downtime.

8. Pitfall — UUID v4 làm primary key trong write-heavy table

Pitfall — UUID v4 PK: write amplification tích lũy

UUID v4 primary key trong bảng có nhiều INSERT không chỉ tốn throughput ngay lúc insert — bloat tích lũy làm chậm read dần theo thời gian. Ở 100 triệu row, index có thể lớn gấp 2–3 lần so với BIGSERIAL PK, ảnh hưởng buffer cache hit rate của mọi query dùng index đó.

-- ANTI-PATTERN: UUID v4 PK trong write-heavy table
CREATE TABLE events (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),  -- v4 random
  event_type TEXT NOT NULL,
  payload    JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Sau 100 trieu INSERT:
-- 1. Index size ~2.5x lon hon serial PK cung data
-- 2. Insert throughput rot 40-60% (page split overhead)
-- 3. Cache miss tang: random insert cham every page -> warm buffer kho
-- 4. VACUUM gap kho hon: dead entry rai khap tree, khong theo locality

-- Fix 1: Switch sang UUID v7 (sequential) -- PG 18 native
CREATE TABLE events (
  id UUID PRIMARY KEY DEFAULT uuidv7(),  -- PG 18+
  -- ...
);

-- Fix 2 (PG 13-17): extension pg_uuidv7
-- CREATE EXTENSION pg_uuidv7;
-- DEFAULT uuid_generate_v7()

-- Fix 3: BIGSERIAL PK + UUID public_id (dual key pattern, M04.5 cua khoa nay)
CREATE TABLE events (
  id        BIGSERIAL PRIMARY KEY,
  public_id UUID UNIQUE NOT NULL DEFAULT gen_random_uuid(),
  -- ...
);
-- B-tree sequential tu BIGSERIAL, unguessable URL tu public_id

9. Applied — TaskFlow B-tree introspection

Kiểm tra index B-tree trên dashboard query phổ biến nhất của TaskFlow:

-- Tao composite index cho dashboard
CREATE INDEX idx_tasks_dashboard
ON tasks(assignee_id, status, due_at);

-- Check height + root page
SELECT level AS height, root, fastroot
FROM bt_metap('idx_tasks_dashboard');
-- height | root | fastroot
--      2 |    3 |        3

-- Xem items trong leaf page so 1
SELECT itemoffset, ctid, itemlen
FROM bt_page_items('idx_tasks_dashboard', 1)
LIMIT 5;

-- So sanh index size sau khi them 1M task
SELECT
  indexrelid::regclass AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relid = 'tasks'::regclass;

Forward: Module 11 của khoá này đi sâu pageinspect cho production — tìm index bloat, so sánh trước/sau REINDEX, và quyết định khi nào rebuild index là cần thiết.

10. Deep Dive — B-tree implementation

📚 Deep Dive — B-tree implementation

PostgreSQL Documentation — B-tree Indexes (Ch.64) — official spec: cú pháp, operator class, deduplication (PG 13+), và implementation notes.
PostgreSQL 14 Internals — Egor Rogov, Part V "Indexes" — free PDF, ~50 trang B-tree section: page format chi tiết, split mechanics, fastpath insert, deduplication, và vacuum interaction. Đây là nguồn sâu nhất về PG-specific B-tree.
Use The Index, Luke — "The Tree" chapter — Markus Winand, free web. Visual + intuition tốt nhất cho branching factor và why height = log(N). Đọc trước Rogov nếu cần hình dung.
Bayer & McCreight 1972 — "Organization and Maintenance of Large Ordered Indexes" — paper gốc định nghĩa B-tree (Wikipedia summary, paper gốc sau ACM paywall). Lịch sử và intuition gốc của tác giả.

Ghi chú: Use The Index Luke cho visual intuition → Rogov Part V cho PG-specific implementation → PG docs Ch.64 cho cú pháp và operator class. Bayer 1972 cho lịch sử, không cần đọc để practice.

11. Liên kết khoá học khác

Khoá Thuật toán — bài 3.7 B-tree & B+tree — fundamentals B-tree/B+tree theo góc CTDL: fanout, leaf chain, disk seek analysis.
Khoá Thuật toán — bài 1.4 Memory & cache locality — vì sao page-based sequential access nhanh hơn random access (cache line, prefetch).
Khoá Thuật toán — bài 3.11 Case study MySQL & Redis — so sánh B+tree vs hash table trong InnoDB và Redis ở hệ thống thực tế.

12. Tóm tắt

B-tree multi-way balanced, không phải binary — PG branching factor ~400 key/page (8 KB page, ~22 byte/entry).
Height = log_branching(N): 1 tỷ row chỉ cao 4–5 cấp → 4–5 disk I/O per lookup → sub-millisecond với warm cache.
Page split khi leaf đầy: cấp phát page mới, move 50% entry, update parent → 2 write thay vì 1.
Sequential insert (BIGSERIAL, UUID v7) → insert vào leaf cuối → split hiếm → ~1 write/insert.
Random insert (UUID v4) → insert phân tán khắp tree → split thường xuyên → 2–3 write/insert + bloat.
Fill factor: 90 mặc định (10% free space), 80–85 cho bảng heavy UPDATE tại chỗ (HOT update), 100 cho archive/read-only.
DELETE không reclaim index space ngay — MVCC yêu cầu giữ dead entry cho concurrent transaction, VACUUM mới reclaim.
Forward: bài 3 của module này (composite index ordering — leftmost prefix rule), Module 6 (MVCC + VACUUM interaction), Module 11 (production bloat monitoring + reindex strategy) của khoá này.

13. Tự kiểm tra

Tự kiểm tra

Vì sao B-tree PostgreSQL chỉ cao 4–5 cấp với 1 tỷ row, trong khi binary search tree cần 30 cấp? Giải thích theo branching factor và công thức height.

▸

Binary search tree có branching factor 2 — mỗi node 2 nhánh. Height = log₂(N). Với 1 tỷ row: log₂(10⁹) ≈ 30 cấp → 30 disk I/O per lookup.

B-tree PostgreSQL có branching factor ~400: mỗi page 8 KB chứa được khoảng 400 entry (key ~16 byte + pointer 6 byte = 22 byte, 8192/22 ≈ 372). Height = log₄₀₀(10⁹) = log(10⁹)/log(400) = 9/2.6 ≈ 3.5 → 4 cấp. Branching factor cao đến từ page lớn (8 KB) và key nhỏ so với page. Kết quả: cùng 1 tỷ row, B-tree chỉ cần 4 disk I/O thay vì 30 — nhanh hơn 7–8 lần chỉ tính I/O.

Phân biệt BIGSERIAL insert vs UUID v4 insert về page split. Write amplification khác nhau bao nhiêu? Vì sao?

▸

BIGSERIAL (sequential): key mới luôn lớn hơn mọi key hiện có → insert vào leaf page cuối cùng của B-tree. Leaf cuối có fill factor ~90% → còn 10% free space → insert không cần split. Khi leaf cuối đầy → split 1 lần → page mới rỗng tiếp tục nhận sequential insert. Amortized: ~1 write per insert sau steady state.

UUID v4 (random): key mới phân bố đều trong không gian UUID → rơi vào bất kỳ leaf page nào trong B-tree. Các leaf page không nằm ở cuối đã được lấp đầy dần → xác suất rơi vào leaf đầy cao (~50% sau khi tree ổn định) → page split thường xuyên. Mỗi split: ghi 2 page (leaf cũ + leaf mới) thay vì 1. Trung bình: 2–3 write per insert → write amplification 2–3x so với BIGSERIAL.

Page split lifecycle: khi nào root bị split và height tăng? Cost của root split so với leaf split?

▸

Leaf split xảy ra khi leaf page đầy: allocate page mới, move 50% entry, đẩy mid-key lên parent internal node. Nếu parent internal node cũng đầy → split internal node, đẩy mid-key lên grandparent. Chuỗi split leo dần lên tree.

Root split xảy ra khi chuỗi split leo đến root và root cũng đầy: PG allocate root page mới, tạo hai child từ nội dung root cũ, root mới chỉ chứa 1 key (mid-key). Height tăng 1.

Cost root split cao hơn leaf split vì: (1) phải allocate và write nhiều page hơn (root + 2 child), (2) lock range rộng hơn trong quá trình split, (3) cả tree phải cập nhật con trỏ root. Nhưng root split rất hiếm — chỉ xảy ra khi tree vừa tăng lên height mới, ví dụ khi table grow từ vài triệu lên vài trăm triệu row.

Fill factor 90 mặc định — giảm xuống 80 cho bảng có nhiều UPDATE có hợp lý không? Tradeoff cụ thể là gì?

▸

Hợp lý trong một tình huống cụ thể: table có nhiều UPDATE tại chỗ (không thay đổi indexed column) và muốn dùng HOT update (Heap-Only Tuple). HOT update thành công khi row mới có thể ghi cùng heap page với row cũ — không cần update B-tree index. Nếu leaf page có 20% free space (fillfactor=80), row mới có nhiều khả năng fit vào cùng heap page → HOT update thành công → tiết kiệm index I/O.

Tradeoff: fillfactor=80 lãng phí 20% storage trên mỗi leaf page. Với index 1 GB, bạn tốn thêm ~200 MB. Đổi lại: ít index update hơn, throughput UPDATE cao hơn. Không nên giảm fillfactor cho table append-only hoặc read-mostly — lãng phí storage không đổi lại lợi ích gì.

DELETE 1 triệu row không giải phóng B-tree index space ngay. Vì sao MVCC ngăn? Khi nào VACUUM mới reclaim được?

▸

MVCC (Multi-Version Concurrency Control) cho phép các transaction chạy đồng thời nhìn thấy snapshot data nhất quán tại thời điểm transaction bắt đầu. Khi DELETE một row, PG không xoá vật lý ngay — chỉ đánh dấu row với xmax (transaction ID của DELETE). Row đó vẫn cần tồn tại để các transaction đang chạy (bắt đầu trước DELETE) vẫn nhìn thấy nó theo snapshot của họ.

Tương tự trong B-tree: entry bị DELETE được đánh dấu dead nhưng giữ nguyên vị trí trên leaf page. VACUUM mới reclaim khi: (1) không còn transaction nào cần snapshot trước thời điểm DELETE, (2) autovacuum hoặc manual VACUUM chạy quét qua page đó. VACUUM tính xmin (oldest active transaction) — entry có xmax nhỏ hơn xmin là safe to reclaim.

events table 100 triệu row/ngày, dùng UUID v4 PK, insert throughput rớt 40%. Đề xuất 2 fix với tradeoff mỗi cái.

▸

Fix 1 — Switch sang UUID v7: UUID v7 có 48 bit timestamp prefix → sequential trong mỗi millisecond → B-tree insert giống BIGSERIAL, page split giảm về gần 0. Tradeoff: cần PG 18 (native uuidv7()) hoặc extension pg_uuidv7 cho PG 13–17. Migration: không thể migrate existing PK (immutable), chỉ apply cho row mới — giai đoạn chuyển tiếp có mixed v4 (cũ) + v7 (mới). B-tree cải thiện dần theo thời gian khi row mới chiếm phần lớn.

Fix 2 — Dual key BIGSERIAL + UUID public_id: giữ BIGSERIAL làm internal PK (sequential, B-tree friendly), thêm column public_id UUID UNIQUE NOT NULL DEFAULT gen_random_uuid() cho URL unguessable. Internal query dùng id BIGSERIAL → sequential index. Public API expose public_id UUID. Tradeoff: thêm 1 column + 1 UNIQUE index (lại là UUID v4 random → bloat trên public_id index). Phù hợp hơn khi cần cả distributed-friendly ID lẫn performance, nhưng public_id index vẫn có write amplification từ UUID v4 random.

Production index `tasks_pkey` có level=8 (trả về từ bt_metap). Table có 1 tỷ row. Có bình thường không? Khi nào level lớn bất thường là dấu hiệu cần quan tâm?

▸

Level=8 với 1 tỷ row là bất thường. Theo công thức, branching factor ~400: log₄₀₀(10⁹) ≈ 4 → height 4–5 là bình thường. Level=8 nghĩa là branching factor thực tế thấp hơn nhiều so với lý thuyết.

Nguyên nhân thường gặp: (1) Key size lớn — nếu key là TEXT dài hoặc composite index nhiều column, mỗi entry chiếm nhiều byte → ít entry/page → branching factor thấp → height tăng. (2) Bloat nghiêm trọng — nhiều dead entry chiếm space trên leaf page → page fill thực tế thấp → effective branching factor giảm → tree phải split sớm, height tăng. (3) UUID v4 PK với nhiều split — half-empty page sau split → branching factor hiệu quả giảm.

Hành động: chạy pgstatindex để đo avg_leaf_density. Nếu dưới 60–70% → bloat nghiêm trọng → REINDEX CONCURRENTLY hoặc pg_repack. Sau rebuild, level sẽ giảm về giá trị lý thuyết.

⁂

Bài tiếp theo: Composite index ordering — leftmost prefix rule

Bài này có giúp bạn hiểu bản chất không?

← Bài trước

Tại sao cần index — heap scan 800ms vs index scan 0.2ms

Bài tiếp

Composite index ordering — leftmost prefix rule + INCLUDE covering