~30 phútCấu trúc tuyến tínhMiễn phí… lượt xem

Mini-challenge — Implement LRU Cache thủ công (không dùng LinkedHashMap)

Implement LRU Cache đạt O(1) cho cả get và put bằng cách kết hợp HashMap và Doubly Linked List từ đầu — pattern mà Caffeine, Redis, và Linux page cache đều dùng.

Cache memory-bound — máy chủ chỉ giữ được N entry trong RAM. Khi cache đầy và cần thêm entry mới, phải chọn entry nào để đẩy ra. Least Recently Used (LRU) nói: đẩy entry lâu nhất chưa được truy cập — giả định rằng nếu bạn chưa dùng đến nó lâu, bạn ít có khả năng cần nó ngay bây giờ.

LeetCode 146 yêu cầu get và put đều O(1). Java có LinkedHashMap với accessOrder=true làm sẵn — nhưng bài này yêu cầu implement thủ công để hiểu tại sao cần kết hợp HashMap với Doubly Linked List, và tại sao không cấu trúc dữ liệu nào khác đáp ứng được cả hai ràng buộc cùng lúc.

Sau bài này bạn implement được LRU mà Caffeine, Redis, Memcached, và Linux page cache đều dùng pattern này — chỉ tinh chỉnh thêm metric, TTL, và evict policy.

⁂

🎯 Yêu cầu bài toán

Viết class LRUCache<K, V> với API sau:

Constructor: LRUCache(int capacity) — khởi tạo cache với dung lượng tối đa.
V get(K key) — trả về value nếu key tồn tại trong cache, null nếu miss. Khi hit, đánh dấu key là recently used (promote lên MRU).
void put(K key, V value) — insert hoặc update. Nếu key đã tồn tại, update value và promote lên MRU. Nếu cache đầy khi insert key mới, evict LRU entry trước.

Yêu cầu Big-O:

Operation	Complexity
`get(key)`	O(1) average
`put(key, value)`	O(1) average

Ràng buộc:

Không dùng LinkedHashMap hay bất kỳ linked map built-in nào.
Không dùng Map.Entry linked structure có sẵn của Java.
Java 21.

📦 Starter code

import java.util.*;

public class LRUCache<K, V> {
    // TODO: design your data structures here

    public LRUCache(int capacity) {
        // TODO
    }

    // Return value for key, or null on miss.
    // Mark key as recently used on hit.
    public V get(K key) {
        // TODO
        throw new UnsupportedOperationException();
    }

    // Insert or update key-value.
    // If over capacity, evict least recently used entry first.
    public void put(K key, V value) {
        // TODO
        throw new UnsupportedOperationException();
    }
}

Dành 20–25 phút tự làm trước khi xem gợi ý.

⁂

🧪 Test cases

LRUCache<Integer, Integer> c = new LRUCache<>(2);
c.put(1, 100);
c.put(2, 200);
// State: LRU -> [1, 2] <- MRU

assertEquals(100, c.get(1));   // hit -- key 1 promoted to MRU
// State: LRU -> [2, 1] <- MRU

c.put(3, 300);                  // capacity full -- evict LRU (key 2)
// State: LRU -> [1, 3] <- MRU

assertNull(c.get(2));           // evicted -- miss
assertEquals(100, c.get(1));   // still present
assertEquals(300, c.get(3));   // still present

c.put(4, 400);                  // evict LRU (key 1, because key 3 was accessed last)
// State before put: LRU -> [3, 1] <- MRU  (get(1) just promoted key 1)
// Wait -- trace carefully:
//   after c.get(1): state [3, 1] -- key 1 = MRU
//   after c.get(3): state [1, 3] -- key 3 = MRU
//   c.put(4, 400)  -> evict LRU = key 1
// State: LRU -> [3, 4] <- MRU

assertNull(c.get(1));           // evicted
assertEquals(300, c.get(3));   // still present
assertEquals(400, c.get(4));   // just inserted

Bonus stress test: capacity = 1000, 100k random get/put xen kẽ. Verify rằng cache size không bao giờ vượt capacity, và hit rate trên Zipfian access pattern cao hơn đáng kể so với random eviction.

⁂

💡 Gợi ý

💡 Stage 1 — O(1) lookup (đọc khi bắt đầu bí)

get(key) phải là O(1). Cấu trúc dữ liệu nào cho O(1) lookup theo key?

HashMap<K, ?> — lookup theo key là O(1) average. Value của HashMap sẽ lưu gì? Đó là câu hỏi Stage 2.

Thử đầu tiên với HashMap<K, V> thuần: get OK, nhưng làm sao biết entry nào là LRU? HashMap không có khái niệm thứ tự truy cập. Cần thêm một cấu trúc theo dõi order.

💡 Stage 2 — O(1) promote-to-MRU (đọc khi có HashMap rồi nhưng bí phần order)

Cần "move một entry lên vị trí most-recently-used" với chi phí O(1).

Các candidate:

ArrayList: remove ở giữa là O(n) vì phải shift. Không đạt.
Array: tương tự — shift O(n).
Doubly linked list: nếu bạn đang giữ sẵn pointer trực tiếp tới node, thì remove node đó là O(1) (chỉ update prev/next). Insert node vào cuối cũng O(1).

Cross-link: bài 02 module này (Linked List) đã phân tích đúng điểm này — O(1) delete giữa khi có node ref, O(n) nếu chỉ có index.

Doubly linked list giữ order LRU (head) → MRU (tail). Nhưng để "đi thẳng đến một node cụ thể" khi có key, bạn cần HashMap trỏ trực tiếp đến node — không phải trỏ đến value.

💡 Stage 3 — Kết hợp (đọc khi đã có ý tưởng HashMap + linked list)

Thiết kế: HashMap<K, Node<K, V>> + doubly linked list.

HashMap value là pointer thẳng đến node trong linked list — không phải copy value.
Linked list giữ order: head.next = LRU, tail.prev = MRU.
Dùng sentinel head và tail (dummy nodes không chứa data) để tránh null check khi remove/insert ở biên.

HashMap:
  {1: nodeA, 2: nodeB, 3: nodeC}

Doubly linked list (head=LRU sentinel, tail=MRU sentinel):
  [HEAD] <-> nodeA(1,v1) <-> nodeB(2,v2) <-> nodeC(3,v3) <-> [TAIL]

get(2):
  1. HashMap.get(2) -> nodeB                    (O(1))
  2. removeNode(nodeB): nodeA.next=nodeC,       (O(1))
                        nodeC.prev=nodeA
  3. addBeforeTail(nodeB): insert nodeB before  (O(1))
                            TAIL sentinel
  Result: HEAD <-> nodeA <-> nodeC <-> nodeB <-> TAIL

Ba operation phụ cần viết: removeNode(node), addBeforeTail(node), moveToTail(node) = remove + addBeforeTail.

⁂

✅ Lời giải

✅ Lời giải — xem sau khi đã tự làm

import java.util.*;

public class LRUCache<K, V> {

    // Internal doubly linked list node.
    // Private static: does not hold a reference to the outer instance.
    private static class Node<K, V> {
        K key;
        V value;
        Node<K, V> prev;
        Node<K, V> next;

        Node(K k, V v) {
            this.key = k;
            this.value = v;
        }
    }

    private final int capacity;
    private final Map<K, Node<K, V>> map;
    private final Node<K, V> head; // sentinel -- head.next = LRU entry
    private final Node<K, V> tail; // sentinel -- tail.prev = MRU entry

    public LRUCache(int capacity) {
        this.capacity = capacity;
        // Pre-size HashMap to avoid resize: capacity / 0.75 = capacity * 4/3
        this.map = new HashMap<>(capacity * 4 / 3 + 1);
        this.head = new Node<>(null, null);
        this.tail = new Node<>(null, null);
        head.next = tail;
        tail.prev = head;
    }

    public V get(K key) {
        Node<K, V> node = map.get(key);
        if (node == null) return null;   // cache miss
        moveToTail(node);                // promote to MRU
        return node.value;
    }

    public void put(K key, V value) {
        Node<K, V> node = map.get(key);
        if (node != null) {
            // Key already exists: update value and promote
            node.value = value;
            moveToTail(node);
            return;
        }
        // Evict LRU if at capacity
        if (map.size() == capacity) {
            Node<K, V> lru = head.next; // LRU is just after sentinel head
            removeNode(lru);
            map.remove(lru.key);
        }
        // Insert new entry as MRU
        Node<K, V> fresh = new Node<>(key, value);
        addBeforeTail(fresh);
        map.put(key, fresh);
    }

    // Remove node from its current position in the linked list.
    private void removeNode(Node<K, V> node) {
        node.prev.next = node.next;
        node.next.prev = node.prev;
    }

    // Insert node immediately before the tail sentinel (= MRU position).
    private void addBeforeTail(Node<K, V> node) {
        node.prev = tail.prev;
        node.next = tail;
        tail.prev.next = node;
        tail.prev = node;
    }

    // Move node to MRU position.
    private void moveToTail(Node<K, V> node) {
        removeNode(node);
        addBeforeTail(node);
    }
}

Design choices đáng chú ý:

Sentinel head/tail: tránh mọi if (head == null) hoặc if (tail == null) trong removeNode và addBeforeTail. Khi list rỗng, head và tail trỏ trực tiếp vào nhau — không có null pointer nào.
Node là private static: không cầm reference đến instance ngoài (LRUCache). Nếu để non-static, mỗi node sẽ giữ ngầm một reference đến LRUCache cha — tốn thêm 8 byte mỗi node và gây leak nếu node bị truyền ra ngoài.
HashMap pre-size: capacity * 4 / 3 + 1 đảm bảo HashMap không resize cho đến khi vượt capacity. Load factor mặc định 0.75 nên internal capacity tối thiểu cần = size / 0.75.
Node lưu cả key: cần lru.key khi evict để gọi map.remove(lru.key). Nếu node chỉ lưu value, không thể xóa đúng entry khỏi HashMap.

Trace một ví dụ — put(1,A); put(2,B); get(1); put(3,C);:

Operation	HashMap	Linked list (LRU → MRU)
`put(1, A)`	`{1: nodeA}`	HEAD ↔ nodeA(1,A) ↔ TAIL
`put(2, B)`	`{1: nodeA, 2: nodeB}`	HEAD ↔ nodeA ↔ nodeB(2,B) ↔ TAIL
`get(1)` — hit, promote	`{1: nodeA, 2: nodeB}`	HEAD ↔ nodeB ↔ nodeA(1,A) ↔ TAIL
`put(3, C)` — evict LRU=2	`{1: nodeA, 3: nodeC}`	HEAD ↔ nodeA ↔ nodeC(3,C) ↔ TAIL

Sau get(1), key 1 là MRU nên key 2 trở thành LRU — bị evict khi put(3) cần thêm slot.

⁂

⚠️ Pitfall phổ biến

Pitfall 1 — Không dùng sentinel: null pointer khi remove node đầu hoặc cuối.

// WRONG: no sentinel -- removeNode crashes when node is the only entry
private void removeNode(Node<K, V> node) {
    if (node.prev != null) node.prev.next = node.next; // if-null check NOT enough
    if (node.next != null) node.next.prev = node.prev; // still misses head/tail update
}

Khi list có một phần tử duy nhất, node.prev là head pointer của list (nếu lưu riêng biến head), và node.next là null. Code trên bỏ qua update head pointer của list — head vẫn trỏ vào node vừa bị xóa, dẫn đến dangling reference.

Dùng sentinel nodes: removeNode luôn có node.prev và node.next hợp lệ (ít nhất là sentinel) — không cần null check, không có edge case.

Pitfall 2 — Quên map.remove(lru.key) khi evict.

// WRONG: removes node from linked list but leaves stale HashMap entry
private void evictLRU() {
    Node<K, V> lru = head.next;
    removeNode(lru);
    // map.remove(lru.key) -- MISSING
}

HashMap vẫn giữ {lru.key: lru_node}. Node đó không còn trong linked list nhưng HashMap vẫn trỏ đến nó. get(lru.key) sẽ trả về value của node đã bị evict — logic sai. Nghiêm trọng hơn, HashMap cứ tích lũy stale entries: map.size() tăng nhưng linked list không tăng tương ứng — capacity check bị lệch, cache thực sự có thể vượt capacity.

Pitfall 3 — get(key) trên miss không nên throw exception.

// WRONG: cache miss is normal -- not an error condition
public V get(K key) {
    Node<K, V> node = map.get(key);
    if (node == null) throw new NoSuchElementException(key.toString()); // anti-pattern
    ...
}

Cache miss là kết quả bình thường của operation — không phải lỗi. Caller phải bắt exception cho mọi lookup, kể cả cold-start khi cache rỗng. Return null (hoặc Optional<V> nếu muốn tường minh hơn) là đúng contract. Exception chỉ dành cho các tình huống thực sự exceptional — ví dụ capacity vượt quá giới hạn hệ thống.

⁂

📊 Benchmark tham khảo

Capacity 10k, 1M ops (50% get / 50% put), JVM warm-up 3 lần:

  Custom LRUCache (bài này):   ~80ms
  LinkedHashMap (accessOrder): ~85ms
  Caffeine cache (W-TinyLFU):  ~50ms

Custom LRUCache và LinkedHashMap có throughput tương đương vì cùng algorithm. Caffeine nhanh hơn nhờ algorithm tốt hơn (Window TinyLFU — kết hợp LRU và LFU), không phải do Java code tối ưu hơn. Implication: khi optimize cache performance, chọn đúng eviction policy thường quan trọng hơn micro-optimize code.

⁂

🔗 Variant để tự thử

Variant 1 — TTL-based eviction:

Mỗi node thêm long expiryMs. Trong get(key), sau khi hit: kiểm tra System.currentTimeMillis() > node.expiryMs → nếu expired, evict và return null. put(key, value, long ttlMs) tính expiryMs = now + ttlMs.

Câu hỏi: cleanup proactive (background thread sweep) có cần thiết không? Khi cache đầy và toàn bộ entries đã expired, head.next là LRU expired — evict nó là đúng. Khi cache không đầy và entries expired ngồi im: chỉ lãng phí capacity nhưng không trả wrong data. Background sweep giúp free capacity sớm hơn — nhưng thêm độ phức tạp threading.

Variant 2 — Thread-safe LRU:

Cách đơn giản nhất: synchronized trên cả get và put. Đúng nhưng tạo bottleneck toàn bộ cache serialize qua một lock.

Nâng lên: ReentrantReadWriteLock — nhiều thread đọc đồng thời, write lock khi put/evict. Nhưng get vẫn phải write-lock vì nó gọi moveToTail (mutates linked list). Thực ra với LRU, mọi get đều mutate state — read/write lock không giúp nhiều.

Production: Caffeine dùng buffer bất đồng bộ — get chỉ ghi access event vào ring buffer, thread riêng drain buffer định kỳ để update order. Không lock on hot path. Xem com.github.benmanes.caffeine.cache.BoundedLocalCache.

Variant 3 — LFU Cache (LeetCode 460):

Least Frequently Used: evict entry bị truy cập ít nhất. Cần thêm count per entry và HashMap<Integer, LinkedHashSet<K>> map từ frequency → set of keys với that frequency. Dùng biến minFreq để track frequency thấp nhất hiện tại — O(1) evict. Khó hơn LRU đáng kể.

⁂

🔗 LeetCode liên quan

146. LRU Cache — bài chính. Đúng spec bài này.
460. LFU Cache — Least Frequently Used. Evict by access count.
432. All O`one Data Structure — O(1) increment/decrement count, get min/max key — cần doubly linked list của buckets.

⁂

🏭 Applied note

Caffeine (com.github.benmanes.caffeine) — production Java cache library. Dùng Window TinyLFU thay pure LRU: chia cache thành window (20%) + protected (80%) + probationary segment. Entry mới vào window trước, nếu được truy cập đủ nhiều thì được admit vào protected. Giảm "cache pollution" từ scan access pattern.

Linux page cache — kernel dùng "active list" và "inactive list" (two-list LRU variant). Entry mới vào inactive list. Nếu được truy cập lần thứ hai, promote lên active list. Tránh "scan resistance" problem: đọc một file lớn một lần không đẩy toàn bộ working set ra khỏi cache.

Redis maxmemory-policy — cấu hình tại runtime, không compile-time:

allkeys-lru — evict LRU entry trong toàn bộ keyspace.
volatile-lru — evict LRU entry chỉ trong set keys có TTL.
allkeys-lfu — evict LFU entry (Redis 4.0+).
noeviction — return error khi hết memory, không evict.

Cross-link: bài 02 module này (Linked List) — doubly linked list và O(1) delete khi có node ref. Module 3 lesson 02 (HashMap internals — sẽ ship sau) — cơ chế hash collision, load factor, resize.

⁂

✨ Điều bạn vừa làm được

Implement LRU Cache đạt O(1) get và put bằng cách kết hợp HashMap (O(1) lookup) và Doubly Linked List (O(1) promote-to-MRU khi có node ref).
Hiểu tại sao sentinel head/tail loại bỏ toàn bộ null check edge case trong linked list operations.
Biết ba pitfall phổ biến: không sentinel, quên map.remove khi evict, throw exception on cache miss.
Nhận ra pattern này là nền tảng của Caffeine, Redis, Linux page cache — thêm metric, TTL, và evict policy lên trên cùng skeleton.

⁂

Bài tiếp theo: Case Study: LMAX Disruptor

Bài này có giúp bạn hiểu bản chất không?

← Bài trước

PriorityQueue — Top-K, scheduler, Dijkstra cần queue thứ tự

Bài tiếp

Case Study: LMAX Disruptor — Vì sao ring buffer thắng BlockingQueue 10x