~27 phútJVM InternalsMiễn phí

JIT compiler — interpreter, C1, C2, tiered compilation

Bytecode chậm, native code nhanh. JVM bắt đầu bằng interpreter, đo hot method, escalate lên C1 (fast compile), rồi C2 (deep optimize). Inlining, escape analysis, deoptimization. Vì sao Java warm-up vài giây mới đạt peak performance.

Benchmark cũ kinh điển:

long sum = 0;
for (int i = 0; i < 1_000_000_000; i++) {
    sum += i;
}

Lần đầu chạy (cold JVM): ~3 giây. Chạy lại trong cùng JVM (warm): ~0.3 giây. Code y hệt, JVM y hệt — khác biệt 10x.

Câu trả lời: JIT compiler. JVM không thực thi bytecode trực tiếp như interpreter chạy script. Nó profile code khi chạy, phát hiện method "nóng" (gọi nhiều), compile bytecode → native machine code, dùng native code lần sau. Càng nóng càng compile sâu hơn.

JIT là lý do Java đạt performance gần C/C++ trong steady state — đôi khi vượt vì JIT có thông tin runtime mà compiler tĩnh không có (vd "method này 100% case nhận argument null", "branch này không bao giờ vào"). Nhưng cũng là lý do Java nổi tiếng "warm-up chậm" — code mới chạy còn ở interpreter, vài giây mới lên hot.

Bài này đi qua: 3 tier (interpreter → C1 → C2), trigger compile (counter, OSR), 3 optimization quan trọng (inlining, escape analysis, devirtualization), deoptimization (khi assumption sai, JIT undo), và cách đọc JIT log để debug.

1. Analogy — Sách công thức nấu ăn

Bạn mở quán ăn. Đầu bếp có 3 chế độ làm việc:

Tier 0 — Interpreter: với mỗi món, đầu bếp đọc công thức từng dòng từ sách, làm theo. Đúng nhưng chậm — đọc sách + làm + đọc sách + làm. Phù hợp món hiếm khách gọi (1 lần/ngày).

Tier 3 — C1 (fast compile): sau khi làm 1 món vài lần, đầu bếp ghi lại quy trình ngắn ra notebook riêng — không cần mở sách. Nhanh hơn 3-5 lần. Compile nhanh nhưng optimize ít.

Tier 4 — C2 (deep optimize): với món "best-seller" (gọi 100+ lần/giờ), đầu bếp đầu tư công sức optimize: pre-cut nguyên liệu, đặt thành routine cơ bắp, bypass mọi step không cần. Nhanh hơn 10-20 lần. Compile chậm nhưng kết quả gần optimal.

Đầu bếp theo dõi tần suất từng món. Món lên hot → escalate tier. Món tier cao bỗng nhận argument lạ (giả sử khách yêu cầu vegan version) → đầu bếp deoptimize xuống tier 0, làm lại từ sách, học pattern mới.

Đời thường	JVM
Sách công thức	Bytecode
Đọc sách từng dòng	Interpreter
Notebook riêng	C1 compiled code
Routine cơ bắp	C2 compiled code
Đếm số lần làm	Method invocation counter
Khách order lạ	Type check fail / branch chưa thấy
Quay lại sách	Deoptimization

💡 Cách nhớ

Java khởi động chậm vì interpreter, chạy nhanh vì C2. Tiered = chạy đa tier song song, escalate theo profile. Khi assumption sai → deopt xuống tier thấp.

2. Vì sao JIT thay vì AOT (compile trước)?

AOT (Ahead-of-Time): compile bytecode → native lúc build. Like C compile thẳng .exe. Ưu: startup nhanh (no warm-up). Nhược: thiếu profile runtime, optimize kém hơn JIT trên steady state.

JIT (Just-in-Time): compile khi chạy, có profile thực. Ưu: optimize aggressive với assumption "case này 99% xảy ra". Nhược: warm-up chậm.

JVM truyền thống chọn JIT vì Java target server long-running — vài giây warm-up đổi lấy throughput cao hàng giờ là deal tốt.

Recent: GraalVM Native Image (AOT cho Java) cho serverless / CLI — không warm-up, startup <100ms. Trade-off peak performance thấp hơn HotSpot JIT 10-20%.

3. 3 tier compilation

HotSpot JVM (mainstream OpenJDK) có 5 tier nhưng practical là 3:

Tier	Compiler	Tốc độ compile	Code quality
0	Interpreter	N/A — không compile	Chậm nhất
1-3	C1 (Client)	Nhanh, ~1ms/method	Trung bình, ~3-5x interpreter
4	C2 (Server)	Chậm, ~10-100ms/method	Cao nhất, ~10-20x interpreter

(Tier 2 và 3 là biến thể C1 với mức profile khác nhau — không cần phân biệt practical.)

Flow điển hình

flowchart LR
    A[Method new] -->|0 invocation| B[Interpreter]
    B -->|hot threshold ~1500| C[C1 compile]
    C -->|hotter ~10000| D[C2 compile]
    D -->|assumption fail| B
    D -->|stable| D

Counter: mỗi method có invocation counter + back-edge counter (loop). Vượt threshold → submit cho compile thread.

Threshold default (tunable):

Interpreter → C1: ~1500 invocation.
C1 → C2: ~10000 invocation.

Nhỏ với loop nóng. Có thể thấy code đạt C2 trong < 1 giây nếu loop chạy hàng trăm triệu lần.

Compile thread pool

JIT compile background: thread compiler riêng biệt, không block app thread. App thread tiếp tục chạy interpreter trong khi compile thread sản xuất native code. Khi xong, JVM swap callsite từ interpreter → native — dòng code tiếp theo dùng native.

Số compile thread default ~4 (tuỳ #core). Tune bằng -XX:CICompilerCount=N.

OSR — On-Stack Replacement

Method lớn có loop chạy hàng tỷ iteration → đếm theo invocation không kịp escalate (mới gọi 1 lần). JVM dùng back-edge counter (đếm vòng lặp).

void compute() {
    for (int i = 0; i < 1_000_000_000; i++) {
        // hot loop
    }
}

Sau ~10000 iteration, JIT nhận biết loop nóng. Vấn đề: method vẫn đang chạy giữa loop → không thể "replace" method bình thường (frame đang trên stack).

Giải pháp: OSR (On-Stack Replacement) — JIT compile loop riêng, swap thẳng frame interpreter → frame native giữa loop. Loop tiếp tục với native code, state preserved.

Trong JIT log thấy:

1234   42 % 4   com.foo.Bar::compute @ 12 (123 bytes)

% đánh dấu OSR. @ 12 là bytecode offset (back-edge) trigger.

4. 3 optimization quan trọng

4.1 Inlining — quan trọng nhất

Inline = paste body callee vào caller, bỏ call overhead.

int square(int x) { return x * x; }
int sumOfSquares(int n) {
    int sum = 0;
    for (int i = 0; i < n; i++) {
        sum += square(i);   // Goi method
    }
    return sum;
}

C2 inline:

int sumOfSquares(int n) {
    int sum = 0;
    for (int i = 0; i < n; i++) {
        sum += i * i;       // Inline
    }
    return sum;
}

Lợi ích:

Loại bỏ method call overhead (push frame, pop frame).
Mở ra optimize across boundary: register allocation tốt hơn, constant fold, dead code elimination, loop invariant hoisting.

Threshold: method ≤ 35 byte (default) → "always inline candidate". Method > 325 byte → "never inline" (quá lớn, code bloat). Giữa thì tuỳ counter.

final method, static, private dễ inline hơn — không cần devirtualize.

4.2 Escape analysis

JIT phân tích: object có "escape" khỏi method scope không? Nếu không (chỉ dùng trong method, không lưu field, không return, không pass thread khác) → có thể stack-allocate thay heap.

String greet(String name) {
    StringBuilder sb = new StringBuilder();
    sb.append("Hi, ");
    sb.append(name);
    return sb.toString();
}

sb không escape (chỉ dùng nội bộ, return là String mới). C2 nhận biết → không alloc StringBuilder trên heap. Field char[] có thể alloc trên stack hoặc inline thẳng vào local register. Loại bỏ allocation cost + GC pressure.

Trong production, escape analysis có thể save 30-50% allocation cho code dùng builder/wrapper pattern. Lý do code Java idiomatic không "chậm" như C++ developer hay nghĩ.

4.3 Devirtualization

Java invokevirtual cần lookup vtable runtime. JIT optimize: nếu nhìn thấy chỉ 1 implementation trong runtime profile, inline direct.

List<Integer> list = ...;        // Static type List
list.forEach(System.out::println);  // invokeinterface

Profile: list luôn là ArrayList. JIT speculate "luôn ArrayList", inline ArrayList.forEach. Insert type check (guard): if not ArrayList → fall back interpreter.

// JIT-compiled native code (pseudo):
if (list.getClass() != ArrayList.class) goto deopt;
// Inline ArrayList.forEach body...

Type check: 1 instruction (so klass pointer). Nếu fail → deoptimize, đi lại interpreter, learn case mới.

Đây là lý do monomorphic call site (1 type) cực nhanh, polymorphic (2-3 type) vẫn nhanh (chained inline cache), megamorphic (vượt 3 type) chậm hơn (full vtable lookup).

Code design tip: tránh interface với hàng chục implementation trong hot path. Mỗi callsite chỉ thấy ít type → JIT optimize tốt.

5. Deoptimization — khi JIT undo

JIT compile dựa trên assumption từ profile. Khi assumption sai, JIT phải undo — quay lại interpreter, học case mới.

Trigger deopt

Type không khớp speculative: code đang chạy ArrayList, bỗng có LinkedList → speculative inline vỡ.
Branch chưa thấy: code có if (rare) mà profile chưa thấy nhánh true → JIT compile chỉ nhánh false + uncommon trap. Khi rare = true runtime → trap → deopt.
Class load mới: subclass mới load có thể override method được inline. JVM invalidate code cũ.
Assertion JVM internal: vd assertion về object layout không còn đúng.

Cost deopt

Deopt là rất đắt:

Native frame phải convert lại thành interpreter frame (rebuild local + stack).
Mã native compile bị throw đi.
Method quay về interpreter — chậm hàng chục lần.
Nếu method vẫn nóng, sẽ recompile sau vài nghìn invocation thêm.

App pattern xấu: deopt liên tục (vd type biến đổi mỗi vài giây) → JIT thrash, performance không bao giờ steady.

`-XX:+PrintCompilation`

Bật để xem JIT activity:

java -XX:+PrintCompilation MyApp

Output:

   123    1     b      java.lang.String::hashCode (49 bytes)
   145    2     n      java.lang.Object::getClass (native)
   201    3       3   com.foo.Bar::process (12 bytes)
   312    4       4   com.foo.Bar::process (12 bytes)
   456    3       3   com.foo.Bar::process (12 bytes)   made not entrant
   457    5       4   com.foo.Bar::process (12 bytes)

Cột:

Timestamp (ms từ JVM start).
Compile ID.
Flag: b blocking, n native, s synchronized, ! exception, % OSR, made not entrant (deopt), made zombie (cleanup).
Tier (3 = C1, 4 = C2).
Method + bytecode size.

Trong ví dụ: process compile C1 (tier 3) lúc 201ms, C2 (tier 4) lúc 312ms. Sau đó "made not entrant" — invalidate. Recompile lúc 457ms.

6. Inline cache và type profile

Callsite invokevirtual được JIT track type seen:

Monomorphic (1 type seen):
  Inline target.
  Insert type check, fail -> deopt.

Bimorphic (2 type seen):
  Inline cache 2 chain.
  type1 -> target1, type2 -> target2.
  3rd type -> deopt.

Polymorphic (3+ type):
  Convert to megamorphic.
  Full vtable lookup mỗi call (chậm).

Code "stable type" (luôn cùng class) → JIT đầu tư inline. Code "type Đa dạng" → JIT bỏ optimize, dùng vtable.

Tip: type-stable callsite

// Stable - tot
for (Order order : orders) {
    order.calculate();        // order luon cung type cu the (vd OrderImpl)
}

// Megamorphic - cham
for (Animal animal : zoo) {
    animal.sound();           // Dog, Cat, Bird, ... xen ke
}

Performance nhánh 2 chậm hơn ~3-5x do vtable lookup mỗi call. Workaround: dispatch theo type batch:

groupBy(animals, Animal::getClass).forEach((cls, list) -> {
    list.forEach(Animal::sound);   // Trong batch, callsite monomorphic
});

Hoặc redesign: tách Dog list, Cat list riêng — mỗi callsite type cố định.

7. JIT log — debug performance

`-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining`

In quyết định inline:

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining MyApp 2>&1 | grep -A2 'process'

Output:

@ 5   java.util.ArrayList::size (5 bytes)   inline (hot)
@ 12  com.foo.Bar::validate (8 bytes)        inline (hot)
@ 28  com.foo.Bar::log (95 bytes)            too big
@ 45  com.foo.Bar::handleCase (250 bytes)    callee is too large

Lý do "too big" → -XX:MaxInlineSize=N tune, hoặc refactor method nhỏ hơn.

`-XX:+LogCompilation`

Dump XML chi tiết về compile decision. Phân tích bằng tool như JITWatch (https://github.com/AdoptOpenJDK/jitwatch).

JITWatch UI: thấy method nào compile tier nào, deopt khi nào, lý do, suggest tune.

JFR event

JFR (Java Flight Recorder, bài 6) capture compilation event:

jdk.Compilation — mỗi lần compile.
jdk.CompilerInlining — quyết định inline.
jdk.Deoptimization — mỗi deopt với lý do.

Production-safe: JFR overhead < 1%, có thể bật always-on.

8. AOT và CDS — bổ trợ JIT

JIT không phải solution duy nhất. JVM có 2 feature giảm warm-up:

Pre-load core JDK class vào shared archive (.jsa file), map vào memory mỗi JVM start. Bypass class loading + linking.

# Tao archive cho app
java -XX:ArchiveClassesAtExit=app.jsa -jar myapp.jar

# Chay voi archive
java -XX:SharedArchiveFile=app.jsa -jar myapp.jar

Startup nhanh hơn ~30-50% cho CLI / serverless. Spring Boot 3.3+ tích hợp tự động.

AOT JIT cache — JEP 483 (Java 24, preview)

Pre-compile hot method ra cache trên đĩa. Lần JVM sau load cache → có sẵn native code, skip warm-up.

Trade-off: cache phụ thuộc CPU + JVM version. Không chia sẻ được giữa máy khác.

GraalVM Native Image

Compile toàn bộ app thành native binary, không JVM runtime. Startup <100ms. Memory footprint nhỏ. Phù hợp serverless (AWS Lambda).

Trade-off:

Peak performance ~10-20% thấp hơn HotSpot JIT (no profile-guided optimize).
Reflection / dynamic class loading khó (cần config trước).
Build time chậm (~vài phút).

Spring Boot 3 + Quarkus + Micronaut hỗ trợ Native Image tốt.

9. Pitfall tổng hợp

❌ Nhầm 1: Benchmark chưa warm-up.

long t1 = System.nanoTime();
hotMethod();
long t2 = System.nanoTime();
System.out.println((t2 - t1) / 1000 + "us");   // Do interpreter time, vo nghia

✅ Dùng JMH (bài Module 13). Tự warm-up + measure đúng.

❌ Nhầm 2: Method "lạnh" tưởng nhanh vì code đơn giản.

String.equals chay tren cold path = 100x cham hon hot path.

✅ Kiểm performance ở steady state, không lần đầu.

❌ Nhầm 3: Method quá lớn không inline được.

void doEverything() { /* 500 dong */ }   // Khong inline -> caller cham

✅ Tách method nhỏ. JVM inline tốt hơn nhiều method nhỏ vs 1 method to.

❌ Nhầm 4: Catch exception trong hot loop.

for (int i = 0; i < n; i++) {
    try { ... } catch (Exception e) { ... }   // JIT bỏ optimize
}

✅ Move try ngoài loop, hoặc validate trước thay vì catch.

❌ Nhầm 5: Megamorphic callsite trong hot path.

for (Item item : items) {
    item.process();   // 10 type khac nhau -> vtable lookup mỗi call
}

✅ Group by type, hoặc redesign avoid polymorphism trong hot path.

❌ Nhầm 6: Tin "Java luôn chậm hơn C++".

Steady state: Java 90-110% C++ throughput trong nhiều benchmark.
Cold: chậm hơn 10x.

✅ Đo trên use case thực — đúng workload, đúng warm-up.

10. 📚 Deep Dive Oracle

📚 Deep Dive Oracle

Spec / reference chính thức:

HotSpot Performance Techniques — wiki chính thức về optimize HotSpot.
JEP 165: Compiler Control — fine-tune JIT decision per method.
JEP 295: Ahead-of-Time Compilation — AOT cũ (deprecated trong Java 17, thay bằng GraalVM).
JEP 483: AOT JIT Cache — Java 24, cache JIT compiled code.
JEP 310: Application Class-Data Sharing — CDS cho app.
GraalVM Native Image — AOT alternative.
JITWatch — visual tool phân tích JIT log.
"The Java HotSpot Performance Engine Architecture" — paper Oracle giải thích HotSpot internal.

Ghi chú: HotSpot wiki là kho compiler optimize chi tiết — escape analysis, range check elimination, lock coarsening, ... Đọc khi profile cụ thể optimization. JITWatch UI rất hữu ích: load LogCompilation XML, navigate qua method, thấy bytecode + native assembly + decision graph. Production thường dùng JFR thay LogCompilation (overhead thấp), nhưng JITWatch + LogCompilation cho dev phân tích offline depth nhất.

11. Tóm tắt

JVM execute bytecode qua tầng: Interpreter (chậm nhất, no compile) → C1 (compile nhanh, optimize ít) → C2 (compile chậm, optimize sâu).
Tiered compilation: chạy đa tier song song. Method nóng escalate. Threshold ~1500 (C1), ~10000 (C2) invocation.
Counter: invocation counter + back-edge counter (loop) trigger compile.
OSR (On-Stack Replacement): compile loop khi method đang chạy, swap frame interpreter → native giữa loop.
Inlining: optimize quan trọng nhất. Method ≤ 35 byte luôn inline. Method > 325 byte không inline.
Escape analysis: object không escape method scope → stack-allocate, save GC pressure.
Devirtualization: callsite monomorphic (1 type) inline direct + type guard. Bimorphic chained inline cache. Megamorphic full vtable lookup chậm hơn 3-5x.
Deoptimization: assumption sai (type không match, branch chưa thấy, class mới load) → JIT undo, quay interpreter. Đắt — tránh thrashing.
-XX:+PrintCompilation: log mỗi compile event. -XX:+PrintInlining log inline decision.
JFR capture compilation + deopt event với overhead < 1% — production safe.
CDS giảm startup time bằng share class archive.
GraalVM Native Image AOT toàn app — startup <100ms, peak performance thấp 10-20%.
Code design: keep hot method nhỏ, callsite type-stable, tránh exception trong hot loop.
Java cold = chậm 10x, Java warm = gần C++. Đo ở steady state với JMH.

12. Tự kiểm tra

Tự kiểm tra

Vì sao Java có "warm-up" mà C++ không, và điều này ảnh hưởng benchmark thế nào?

▸

C++ compile AOT — code thành native lúc build, ngay khi exe chạy đã là native optimal. Không warm-up.

Java compile JIT — JVM start với bytecode + interpreter. Method chạy được tracked qua counter. Nóng (~1500 invocation) → C1 compile → ~3-5x nhanh hơn. Hotter (~10000) → C2 compile → ~10-20x nhanh hơn. Quá trình này mất từ vài giây đến vài phút tuỳ workload.

Benchmark sai phổ biến:

long t1 = System.nanoTime();
hotMethod();   // Lan dau, interpreter, ~10x cham
long t2 = System.nanoTime();
print(t2 - t1);   // Sai - do co interpreter time

Đúng cách:

Warm-up: chạy method 10000+ lần trước khi đo. Cho JIT compile lên C2.
Repeat measure: đo nhiều lần, lấy median (không mean — outlier skew).
JMH: framework chuẩn cho microbenchmark, handle warm-up + dead code elimination + constant fold tự động.

Production implication: app vừa start có throughput thấp 30-60s đầu. Load balancer / health check phải đợi warm-up. Pattern "pre-warm với synthetic traffic" trước khi route real traffic cho instance mới.

Inlining là gì, vì sao nó quan trọng nhất trong các optimize JIT?

▸

Inlining = thay callMethod(args) bằng body method đó tại chỗ caller.

// Truoc inline
int sum(int n) {
  int s = 0;
  for (int i = 0; i < n; i++) s += square(i);
  return s;
}
int square(int x) { return x * x; }

// Sau inline
int sum(int n) {
  int s = 0;
  for (int i = 0; i < n; i++) s += i * i;
  return s;
}

Quan trọng nhất vì:

Loại call overhead: push frame, pop frame ~10-20 ns. Trong loop 1 tỷ → 10-20 giây overhead. Inline loại sạch.
Mở ra optimize across boundary: register allocation tốt hơn (giữ value trong register thay vì spill stack), constant folding (square(2) → 4), dead code elimination.
Loop optimize: loop invariant hoisting, vectorization, unrolling — đều cần thấy toàn loop body, inline mở rộng phạm vi.

Threshold:

Method dưới 35 byte: always inline candidate.
Method vượt 325 byte: never inline (code bloat).
Giữa: tuỳ counter + caller hot.

Tip:

Method nhỏ, focused → JIT inline tốt.
"God method" 500 dòng → không inline → caller chậm. Tách thành nhiều method nhỏ.
final / static / private dễ inline (no virtual lookup).

Performance Java idiomatic (nhiều method nhỏ + builder pattern) gần C++ là nhờ inline + escape analysis combine.

Đoạn sau làm hot loop có vấn đề gì? for (Animal a : zoo) a.sound(); — biết zoo chứa 10 type animal khác nhau xen kẽ.

▸

Callsite a.sound() là megamorphic (vượt 3 type). JIT không thể inline / speculate — phải dùng full vtable lookup mỗi call.

Cost so với monomorphic:

Monomorphic (1 type): inline + type guard 1 instruction. Gần free.
Bimorphic (2 type): inline cache chain, 2 type check sequential. Vẫn nhanh.
Megamorphic (3+ type): vtable lookup, ~5-10 ns mỗi call. Chậm hơn monomorphic ~3-5x.

Trong loop 100M element → khác biệt 0.5-1 giây.

Workaround:

Group by type: groupBy(zoo, Animal::getClass).forEach((cls, list) -> list.forEach(Animal::sound)); — trong inner loop, callsite monomorphic (chỉ 1 class).
Sort by type: nếu order không quan trọng, sort danh sách theo class trước iterate. Type-stable streak → JIT speculate được.
Tách collection: Dog list + Cat list + Bird list riêng. Mỗi loop type-stable.
Visitor pattern: nếu method nhiều, dùng visitor — dispatch table có thể được JIT optimize tốt hơn polymorphic call.
Sealed interface: Java 17+ sealed giới hạn implementation → JIT có thể optimize toàn bộ universe (ít hứa hẹn nhưng đang phát triển).

Production impact: nếu hot path xử lý mixed-type collection, profiler (JFR) sẽ show vtable_call hot — dấu hiệu refactor design.

Deoptimization là gì? Vì sao nó "đắt" và làm thế nào để tránh deopt thrashing?

▸

Deopt = JVM huỷ bỏ native code C2 đã compile, quay lại interpreter. Trigger:

Type assumption fail: callsite speculate "luôn ArrayList", giờ gặp LinkedList.
Uncommon trap hit: branch profile thấy "false 100%", JIT compile chỉ false branch + trap; runtime gặp true → trap.
Class load mới: subclass override method được inline → invalidate.
Internal assertion: object layout, lock state, ...

"Đắt" vì:

Native frame phải convert lại thành interpreter frame — rebuild local variable + operand stack từ register/stack native. Tốn ~vài microsecond.
Native code compile bị throw đi.
Method quay interpreter — chậm hơn 10x trong khi chờ recompile.
Recompile sau ~vài nghìn invocation thêm.

Thrashing: deopt liên tục (vd type biến đổi mỗi vài giây) → JIT recompile → chạy native vài giây → deopt → repeat. Performance không bao giờ steady.

Tránh:

Type-stable callsite: design hot path 1 type duy nhất nếu có thể.
Tránh "code path lạ" nhánh hiếm: nếu nhánh true có thể xảy ra, thi thoảng exercise nó (vd benchmark warm-up cover all branches) để JIT biết.
Class hierarchy ổn định: tránh load subclass mới sau warm-up. Plugin / hot reload có thể trigger deopt.
Monitor: JFR event jdk.Deoptimization với reason. Nếu thấy hot method deopt nhiều → investigate.

Production: deopt thoảng (vài lần/giờ) là OK. Deopt nhiều lần/giây trên hot method = performance issue.

OSR (On-Stack Replacement) khác bình thường compile thế nào, và khi nào trigger?

▸

Bình thường JIT compile method, lần gọi sau dùng native. Method đang chạy không đổi.

Vấn đề: method có loop chạy 1 tỷ iteration, được gọi 1 lần. Counter invocation mới có 1, không đạt threshold compile → loop chạy hoàn toàn ở interpreter — chậm khủng khiếp.

OSR fix bằng back-edge counter: đếm số lần loop quay lại. Sau ~10000 iteration:

JIT compile chỉ phần loop (entry là back-edge offset trong bytecode).
JIT chuẩn bị "OSR adapter" — code chuyển interpreter frame state sang native frame state (local variable, stack value).
Tại safe point (back-edge), JVM swap frame interpreter → frame native giữa loop.
Loop tiếp tục với native code, state preserved.

Trong JIT log:

1234   42 % 4   com.foo.Bar::compute @ 12 (123 bytes)

Ký tự % = OSR. @ 12 là bytecode offset back-edge trigger. 4 là tier (C2).

Lưu ý: OSR code không reuse được cho lần gọi sau — chỉ valid cho instance loop hiện tại. Method gọi lại sẽ chạy interpreter từ đầu (đến khi normal C2 compile xong).

Practical: OSR là lý do "loop chạy 1 lần lâu" trong Java vẫn nhanh. Workload kiểu main() chạy 1 loop chính → OSR là main optimize. Nhưng OSR code chất lượng kém hơn normal C2 ~5-10% (vì compile vội với less profile).

Tip: nếu performance critical, design code chia loop thành chunk gọi method → method counter tăng → trigger normal C2 (chất lượng tốt hơn OSR).

Khác biệt giữa HotSpot JIT và GraalVM Native Image — khi nào chọn cái nào?

▸

HotSpot JIT (mainstream OpenJDK):

Bắt đầu interpreter, JIT compile dần lên C2.
Có profile runtime → optimize aggressive (specualtive inline, escape analysis).
Steady state: gần C++ performance.
Cold start: chậm 10x, cần vài giây warm-up.
Memory footprint cao (~vài trăm MB cho JIT cache + metadata).

GraalVM Native Image:

Compile toàn bộ app → native binary lúc build. Không JVM runtime.
Startup <100ms, memory ~vài chục MB.
Không profile runtime → optimize "static" — peak performance thấp hơn HotSpot 10-20%.
Reflection / dynamic class loading khó (cần config reflect-config.json).
Build time chậm (vài phút mỗi build).

Chọn HotSpot khi:

Server long-running (web service, microservice 24/7) — warm-up không vấn đề, peak performance là tất cả.
App dùng reflection, dynamic proxy, classloader nhiều (Spring legacy, Hibernate).
Memory không bị hạn chế gắt.

Chọn Native Image khi:

Serverless (AWS Lambda, Cloud Run): cold start critical, ép <1s. JIT warm-up sẽ chết.
CLI tool: user expect startup tức thì.
Container memory hạn chế (vd 128MB instance).
Edge / IoT: footprint nhỏ.

Spring Boot 3 / Quarkus / Micronaut hỗ trợ Native Image tốt — config reflection metadata tự động cho dependency phổ biến. Pattern hybrid: dev với HotSpot (build nhanh, debug dễ), prod build Native Image cho serverless deployment.

Tương lai: Project Leyden hứa hẹn merge — AOT pre-warm + JIT runtime tinh chỉnh. Cho cả 2 trade-off cùng lúc.

⁂

Bài tiếp theo: Memory layout — heap, metaspace, stack, object header

Bài này có giúp bạn hiểu bản chất không?

Bình luận (0)

Đang tải...

← Bài trước

Bytecode và javap — đọc instruction JVM

Bài tiếp

Memory layout — heap, metaspace, stack, object header