Spring Boot/Pagination & sorting — Pageable, Sort, projection DTO, cursor scroll
~22 phútSpring Data JPAMiễn phí

Pagination & sorting — Pageable, Sort, projection DTO, cursor scroll

Pageable + Sort là API chuẩn cho list endpoint. Bài này bóc cơ chế OFFSET/LIMIT, performance impact deep page, alternatives Slice và keyset pagination, sort multi-field, projection DTO cho perf, validation Pageable, và pattern Spring Data Web.

Module 03 đã introduce Pageable cho REST endpoint. Module 04 này đào sâu DB integration: SQL generated, performance impact OFFSET deep pagination, 3 alternatives (Slice, KeysetPagination, Window), pattern projection DTO cho perf, validation Pageable input.

Pagination quan trọng cho production — list 1M record trả về full = OOM + slow. Implement đúng = scale tới 100M+ record.

1. Pageable basic

public interface ProjectRepository extends JpaRepository<Project, Long> {
    Page<Project> findAll(Pageable pageable);
    Page<Project> findByStatus(ProjectStatus status, Pageable pageable);
}

@RestController
public class ProjectController {

    @GetMapping("/projects")
    public Page<ProjectDto> list(
        @RequestParam(required = false) ProjectStatus status,
        Pageable pageable                           // auto-bind from query string
    ) {
        Page<Project> page = status == null
            ? repo.findAll(pageable)
            : repo.findByStatus(status, pageable);
        return page.map(ProjectDto::from);
    }
}

Request: GET /projects?status=ACTIVE&page=0&size=20&sort=createdAt,desc.

Spring Data Web bind:

  • page=0 → page index (0-based).
  • size=20 → results per page.
  • sort=createdAt,desc → sort field + direction.

Pageable resolve qua PageableHandlerMethodArgumentResolver.

2. Generated SQL

-- Query 1: data
SELECT p.* FROM projects p
WHERE p.status = ?              -- 'ACTIVE'
ORDER BY p.created_at DESC
LIMIT 20 OFFSET 0;

-- Query 2: total count
SELECT COUNT(*) FROM projects p
WHERE p.status = ?;             -- 'ACTIVE'

Spring wrap result trong Page<Project>:

public interface Page<T> extends Slice<T> {
    int getTotalPages();
    long getTotalElements();
    boolean isFirst();
    boolean isLast();
    boolean hasNext();
    boolean hasPrevious();
    int getNumber();
    int getNumberOfElements();
    int getSize();
    Pageable getPageable();
    List<T> getContent();
}

JSON response:

{
  "content": [...],
  "pageable": {"pageNumber": 0, "pageSize": 20, "sort": ...},
  "totalElements": 1000,
  "totalPages": 50,
  "first": true,
  "last": false,
  "size": 20,
  "number": 0,
  "numberOfElements": 20,
  "empty": false
}

3. Performance — OFFSET deep page

OFFSET pagination scale linearly với page number:

-- Page 0
SELECT * FROM projects ORDER BY created_at DESC LIMIT 20 OFFSET 0;
-- Time: 5ms

-- Page 100
SELECT * FROM projects ORDER BY created_at DESC LIMIT 20 OFFSET 2000;
-- Time: 50ms (Postgres skip 2000 rows first)

-- Page 1000
SELECT * FROM projects ORDER BY created_at DESC LIMIT 20 OFFSET 20000;
-- Time: 500ms

-- Page 10000
SELECT * FROM projects ORDER BY created_at DESC LIMIT 20 OFFSET 200000;
-- Time: 5000ms — UI timeout

Vì sao slow: Postgres phải scan + sort 20000 row trước khi apply OFFSET 20000.

Optimize 1 — index:

CREATE INDEX idx_projects_created ON projects(created_at DESC);

Index scan thay sequential. Time linear vẫn nhưng faster (~10x).

Optimize 2 — limit max page:

@GetMapping("/projects")
public Page<ProjectDto> list(
    @RequestParam(defaultValue = "0") @Min(0) @Max(99) int page,    // limit 100 page
    @RequestParam(defaultValue = "20") @Min(1) @Max(100) int size,
    Sort sort
) { ... }

Force user use search/filter thay scroll deep.

Optimize 3 — keyset pagination (section 5).

4. Slice — bỏ COUNT

Slice không count total — chỉ check "có next page".

Slice<Project> findByStatus(ProjectStatus status, Pageable pageable);

Mechanism: Hibernate fetch size + 1 row. Nếu có row thứ size+1hasNext = true.

-- Slice query (size=20)
SELECT * FROM projects WHERE status = ? ORDER BY ... LIMIT 21 OFFSET 0;
-- 1 query only — no COUNT

Performance: ~50% faster cho table lớn (skip COUNT(*)).

JSON:

{
  "content": [...],
  "pageable": ...,
  "first": true,
  "last": false,
  "hasNext": true,
  "size": 20,
  "number": 0,
  "numberOfElements": 20
}

Use case:

  • Infinite scroll UI (Twitter, Instagram).
  • Mobile app không hiển thị "page X of Y".
  • Performance critical — bypass COUNT.

Trade-off: client không biết total. UX show "Load more" button thay "Page X of Y".

5. Keyset pagination — cursor-based

OFFSET fragile với data thay đổi:

Time T1: list page=0, size=20 → return row 1-20
Time T2: insert new row (now ID 1)
Time T3: list page=1, size=20 → expect row 21-40, but get 20-39 (1 row shifted)

Keyset pagination (cursor-based) dùng "last seen value":

public interface ProjectRepository extends JpaRepository<Project, Long> {

    @Query("""
        SELECT p FROM Project p
        WHERE p.createdAt < :cursor
        ORDER BY p.createdAt DESC
        """)
    List<Project> findNextPage(@Param("cursor") Instant cursor, Pageable pageable);
}

// Service
public List<ProjectDto> nextPage(Instant lastCreatedAt, int size) {
    Pageable limit = PageRequest.of(0, size);
    return repo.findNextPage(lastCreatedAt, limit).stream()
        .map(ProjectDto::from)
        .toList();
}

// First page: cursor = Instant.now() (max value)
// Next page: cursor = lastReturnedRow.createdAt

SQL:

-- Page 1
SELECT * FROM projects WHERE created_at < '9999-01-01' ORDER BY created_at DESC LIMIT 20;

-- Page 2 (cursor = lastRow.createdAt = '2026-04-15 10:00:00')
SELECT * FROM projects WHERE created_at < '2026-04-15 10:00:00' ORDER BY created_at DESC LIMIT 20;

-- Page 3 (cursor = next lastRow.createdAt)
SELECT * FROM projects WHERE created_at < '...' ORDER BY created_at DESC LIMIT 20;

Performance: O(log N) với index — same speed mọi page.

Pros:

  • Constant performance mọi page.
  • Stable: insert mới không shift.
  • Real-time friendly: news feed, log viewer.

Cons:

  • No "page X of Y": chỉ next/previous.
  • No jump page: không skip directly to page 50.
  • Sort key unique: nếu nhiều row cùng createdAt → cần composite cursor (createdAt + id).

5.1 Spring Data Window/ScrollPosition (3.1+)

Spring Data 3.1 thêm Window API native:

public interface ProjectRepository extends JpaRepository<Project, Long> {
    Window<Project> findByStatus(ProjectStatus status, ScrollPosition position);
}

// First page
Window<Project> first = repo.findByStatus(ACTIVE, ScrollPosition.offset());

// Next page
ScrollPosition next = first.positionAt(first.size() - 1);
Window<Project> page2 = repo.findByStatus(ACTIVE, next);

ScrollPosition:

  • OffsetScrollPosition — OFFSET-based.
  • KeysetScrollPosition — keyset-based.

Native API support both modes — switch implementation dễ.

6. Sort — multi-field

// Single field
Sort sort = Sort.by("createdAt").descending();

// Multi-field
Sort sort = Sort.by(
    Sort.Order.desc("priority"),
    Sort.Order.asc("name")
);

// In Pageable
Pageable page = PageRequest.of(0, 20, Sort.by("createdAt").descending());

Request URL: ?sort=priority,desc&sort=name,asc.

Spring bind multiple sort fields automatically.

6.1 Sort by nested property

// Sort by Project.owner.name (relationship)
Sort sort = Sort.by("owner.name");

// JPQL: ORDER BY p.owner.name

Hibernate JOIN owner table for sort.

6.2 Custom sort whitelist

User-controlled sort risky:

?sort=password,desc                      # leak password order info
?sort=expensive_compute_field,asc        # slow query

Whitelist allowed:

private static final Set<String> ALLOWED = Set.of("createdAt", "name", "priority");

@GetMapping("/projects")
public Page<ProjectDto> list(Pageable pageable) {
    pageable.getSort().forEach(order -> {
        if (!ALLOWED.contains(order.getProperty())) {
            throw new ValidationException("Cannot sort by: " + order.getProperty());
        }
    });
    return service.list(pageable);
}

Hoặc dùng @SortDefault:

@GetMapping("/projects")
public Page<ProjectDto> list(
    @SortDefault.SortDefaults({
        @SortDefault(sort = "createdAt", direction = Sort.Direction.DESC)
    })
    Pageable pageable
) { ... }

7. Validation Pageable

spring:
  data:
    web:
      pageable:
        default-page-size: 20
        max-page-size: 100               # enforce server-side
        one-indexed-parameters: false    # 0-based default

max-page-size Spring enforce — request size=10000 → cap to 100. No exception, silent cap.

Manual validation:

@GetMapping("/projects")
public Page<ProjectDto> list(
    @PageableDefault(size = 20)
    @Validated Pageable pageable
) {
    if (pageable.getPageSize() > 100) {
        throw new ValidationException("Page size max 100");
    }
    return service.list(pageable);
}

8. Projection DTO + Pageable

Module 04 bài 03 đã giới thiệu projection. Combine với Pageable:

@Query("""
    SELECT new com.olhub.dto.ProjectSummary(p.id, p.name, p.status, COUNT(t.id))
    FROM Project p LEFT JOIN p.tasks t
    WHERE p.status = :status
    GROUP BY p.id, p.name, p.status
    """,
    countQuery = """
    SELECT COUNT(p) FROM Project p WHERE p.status = :status
    """)
Page<ProjectSummary> findSummariesByStatus(@Param("status") ProjectStatus status, Pageable page);

countQuery quan trọng — default count query có thể join unnecessary table cho count. Custom optimal.

Performance:

  • Default: load full Project entity → map to DTO. 10 column from DB, mapping overhead.
  • Projection: SELECT 4 column directly. 50% less network data + faster.

For list endpoint hot path: always use DTO projection.

9. Pattern thực tế

9.1 Filter + sort + page combined

@RestController
public class ProjectController {

    @GetMapping("/projects")
    public Page<ProjectSummary> list(
        @RequestParam(required = false) ProjectStatus status,
        @RequestParam(required = false) @DateTimeFormat(iso = ISO.DATE) LocalDate from,
        @RequestParam(required = false) @DateTimeFormat(iso = ISO.DATE) LocalDate to,
        @PageableDefault(size = 20) @SortDefault("createdAt") Pageable pageable
    ) {
        return service.search(new ProjectFilter(status, from, to), pageable);
    }
}

@Service
public class ProjectService {

    public Page<ProjectSummary> search(ProjectFilter filter, Pageable pageable) {
        // Use Specification cho dynamic filter
        Specification<Project> spec = ProjectSpecs.matching(filter);
        return repo.findAll(spec, pageable).map(ProjectSummary::from);
    }
}

9.2 Slice cho infinite scroll mobile

@GetMapping("/feed")
public Slice<ProjectDto> feed(@PageableDefault(size = 10) Pageable pageable) {
    return repo.findRecent(pageable).map(ProjectDto::from);
}

Mobile UI render "Load more" button. No total count needed.

9.3 Keyset cho realtime activity feed

@GetMapping("/activities")
public ScrollResult<Activity> activities(
    @RequestParam(required = false) Long lastSeenId,
    @RequestParam(defaultValue = "20") @Max(50) int size
) {
    ScrollPosition position = lastSeenId == null
        ? ScrollPosition.offset()
        : ScrollPosition.forward(Map.of("id", lastSeenId));

    Window<Activity> window = repo.scrollOrderByCreatedAtDesc(position, Limit.of(size));

    return new ScrollResult<>(
        window.getContent(),
        window.hasNext() ? window.positionAt(window.size() - 1) : null
    );
}

public record ScrollResult<T>(List<T> items, ScrollPosition nextCursor) {}

Client:

  • First request: no cursor.
  • Each response: nextCursor returned.
  • Next request: send cursor.

UX: news feed style — never out of date even khi data shift.

10. Vận hành production — DoS protection, performance tuning, monitoring

Pagination endpoint thường là entry point bị scrape / abuse. Section này cover security limits, performance tuning, monitoring deep page anomaly.

10.1 Security limits — DoS qua pagination

Anti-pattern user gửi ?size=999999:

# application.yml — Spring Data Web global config
spring:
  data:
    web:
      pageable:
        max-page-size: 100              # cap silent (no error, just truncate)
        default-page-size: 20
        one-indexed-parameters: false   # 0-indexed standard
        prefix: ""                      # ?page=&size=

Hoặc per-endpoint enforce với validation:

@GetMapping("/orders")
public Page<OrderDto> list(
    @RequestParam(defaultValue = "0") @Min(0) @Max(10000) int page,
    @RequestParam(defaultValue = "20") @Min(1) @Max(100) int size,
    @RequestParam(defaultValue = "createdAt,desc") String sort
) { ... }

@Max(10000) cho page ngăn deep page DoS — page 10000 với OFFSET = 200k rows scan.

10.2 Sort field whitelist — security + performance

User gửi ?sort=password,desc → leak data (sort phơi bày null vs non-null) hoặc ?sort=non_indexed_column → full table scan.

private static final Set<String> ALLOWED_SORT = Set.of(
    "createdAt", "updatedAt", "name", "status", "priority"
);

@GetMapping("/orders")
public Page<OrderDto> list(@RequestParam(required = false) String sort, Pageable p) {
    if (sort != null && !ALLOWED_SORT.contains(sort.split(",")[0])) {
        throw new InvalidSortFieldException(sort);
    }
    return service.list(p);
}

Hoặc map qua DTO sort:

public enum OrderSort {
    CREATED_AT("createdAt"),
    NAME("name");

    final String column;
    OrderSort(String c) { this.column = c; }
}

@GetMapping("/orders")
public Page<OrderDto> list(@RequestParam(defaultValue = "CREATED_AT") OrderSort sortBy, ...)

Type-safe — Spring convert enum, invalid value → 400.

10.3 Performance — OFFSET deep page anomaly

Production traffic 99% page 1-3 (UX shallow). Spike page vượt 100 → bot scraping hoặc abuse.

Monitoring metric:

@RestControllerAdvice
public class PaginationMetrics {

    private final MeterRegistry registry;

    @ModelAttribute
    public void trackPagination(@Valid Pageable pageable, HttpServletRequest req) {
        registry.summary("pagination.page",
            "endpoint", req.getRequestURI()
        ).record(pageable.getPageNumber());
    }
}

Alert deep page anomaly:

- alert: DeepPageScrapingSuspect
  expr: histogram_quantile(0.95, pagination_page_bucket) > 100
  for: 10m
  labels:
    severity: warning

P95 page vượt 100 → likely scraping. Investigate → rate limit / CAPTCHA.

10.4 Migration sang keyset pagination

Production app vượt 100k row + frequent insert → migrate OFFSET sang keyset:

// Old API — page-based
GET /orders?page=5&size=20
{ "content": [...], "totalPages": 1000, "page": 5 }

// New API — cursor-based
GET /orders?cursor=eyJpZCI6MTIzfQ&size=20
{ "content": [...], "nextCursor": "eyJpZCI6MTQzfQ" }

Migration strategy:

  1. Add new endpoint /orders/v2 với cursor-based.
  2. Frontend mobile dùng v2 (infinite scroll natural).
  3. Frontend admin giữ v1 (Page X of Y vẫn cần).
  4. Deprecate v1 nếu không còn use case.

Encode cursor base64 JSON {"id": 123, "createdAt": "..."} để client opaque, server decode.

10.5 Failure runbook — pagination

Mode 1 — Slow query deep page:

Triệu chứng: ?page=500&size=100 chạy 30s.

Diagnose: EXPLAIN ANALYZE SELECT ... LIMIT 100 OFFSET 50000 → Postgres scan 50k row trước trả 100.

Remediate: migrate keyset hoặc cap max-page-size strict.

Mode 2 — Inconsistent results across pages:

Triệu chứng: cùng item xuất hiện ở page 1 và page 2 (hoặc bỏ sót).

Diagnose: sort không stable. ORDER BY status không unique → page boundary không xác định.

Remediate: add tie-breaker ORDER BY status, id. Always có unique field cuối sort.

Mode 3 — count() query timeout:

Triệu chứng: list endpoint trả 504, log "count query took 30s".

Diagnose: COUNT trên table lớn vượt timeout. JOIN trong COUNT thừa.

Remediate:

  • Switch sang Slice (skip count).
  • Custom countQuery đơn giản, không JOIN unnecessary.
  • Approximate count với pg_class.reltuples cho table lớn.
@Query(value = "SELECT o FROM Order o WHERE ...",
       countQuery = "SELECT COUNT(o) FROM Order o WHERE ...")  // chi WHERE, no JOIN
Page<Order> findFiltered(...);

11. Pitfall tổng hợp

Nhầm 1: OFFSET deep pagination cho large table. ✅ Keyset pagination cho table 100K+ row + frequent inserts.

Nhầm 2: Quên countQuery cho complex @Query. ✅ Provide countQuery để skip JOIN unnecessary cho count.

Nhầm 3: No max size validation. ✅ max-page-size global hoặc @Max(100) on parameter.

Nhầm 4: Sort by user input không whitelist. ✅ Whitelist allowed columns. Reject suspicious.

Nhầm 5: Page entity, map DTO trong service.

Page<Project> page = repo.findAll(p);          // load full entity
return page.map(ProjectDto::from);             // map in memory

✅ DTO projection trực tiếp via @Query SELECT new ....

Nhầm 6: Bỏ sort default → unstable order across page. ✅ Always specify sort. Default createdAt DESC hoặc id ASC.

Nhầm 7: Page với findAll() (no Pageable).

List<Project> all = repo.findAll();    // could be millions

✅ Always Pageable hoặc Stream cho large dataset.

12. 📚 Deep Dive Spring Reference

13. Tóm tắt

  • Page<T> = content + total count + metadata (totalPages, hasNext, ...). 2 SQL: data + COUNT.
  • Slice<T> = content + hasNext only (no count). 1 SQL với LIMIT size+1.
  • OFFSET pagination: linear performance — slow cho deep page (page 1000+).
  • Keyset pagination (Spring Data 3.1+ Window): O(log N) constant, stable với insert.
  • Spring Data Web auto-bind ?page=&size=&sort=Pageable.
  • Multi-field sort: ?sort=priority,desc&sort=name,asc.
  • Whitelist sort columns — security: prevent leak/slow query via user input.
  • max-page-size enforce server-side — default 100. Cap silent (no exception).
  • DTO projection + Pageable: best performance. Custom countQuery để optimize.
  • Filter + sort + page combined: Specification + Pageable.
  • Pattern UX: Page (totalPages displayed), Slice (infinite scroll), Keyset (realtime feed).

14. Tự kiểm tra

Tự kiểm tra
Q1
So sánh Page, Slice, Keyset pagination. Khi nào pick cái nào?
TypeSQL countTotal infoPerformanceStable insertUse case
Page2 (data + COUNT)totalElements, totalPagesO(N) deep pageAdmin table với "Page X of Y"
Slice1 (data + 1)hasNext onlyO(N) deep pageMobile infinite scroll
Keyset / Window1hasNext + cursorO(log N) constantRealtime feed, log viewer

Decision tree:

Show "Page X of Y" + jump to page?
Yes → Page (need totalElements)
No → Continue

Table size?
Small (<10k rows) → Page or Slice OK
Medium (10k-1M) → Slice (skip COUNT) hoặc keyset
Large (>1M) → Keyset only

Data churn rate (insert frequency)?
Low (<1/min) → OFFSET-based OK
High → Keyset to avoid duplicate / shift

UI pattern?
Numbered pages → Page
"Load more" button → Slice or Keyset
Realtime feed → Keyset

Code 3 cách:

// Page — admin
@GetMapping("/admin/projects")
public Page<ProjectDto> adminList(Pageable pageable) {
  return repo.findAll(pageable).map(ProjectDto::from);
}

// Slice — mobile infinite scroll
@GetMapping("/feed")
public Slice<ProjectDto> feed(@PageableDefault(size = 10) Pageable pageable) {
  return repo.findRecent(pageable).map(ProjectDto::from);
}

// Keyset — realtime activity
@GetMapping("/activities")
public ScrollResult<Activity> activities(
  @RequestParam(required = false) Long lastSeenId,
  @RequestParam(defaultValue = "20") int size
) {
  ScrollPosition pos = lastSeenId == null
      ? ScrollPosition.offset()
      : ScrollPosition.forward(Map.of("id", lastSeenId));

  Window<Activity> window = repo.scrollOrderByCreatedAtDesc(pos, Limit.of(size));
  return new ScrollResult<>(
      window.getContent(),
      window.hasNext() ? window.positionAt(window.size() - 1) : null
  );
}

TaskFlow capstone choice:

  • List projects (admin): Page — show totals.
  • Tasks within project: Slice — typically only browse forward.
  • Activity log (Module 12): Keyset — realtime stream.
Q2
Production app có table 1M projects. User request page 1000 (offset 20000) → 5 giây response. Optimize?

Vấn đề:

SELECT * FROM projects ORDER BY created_at DESC LIMIT 20 OFFSET 20000;
-- Postgres scan 20020 rows, sort, return last 20
-- O(N) — N = OFFSET + LIMIT

4 fix theo độ tốn công:

Fix 1 — Index (cheap, immediate):

CREATE INDEX idx_projects_created_desc ON projects(created_at DESC);

Postgres index scan thay sequential. ~10x faster nhưng vẫn O(N).

Fix 2 — Limit max page (prevent abuse):

@GetMapping("/projects")
public Page<ProjectDto> list(
  @RequestParam @Min(0) @Max(99) int page,        // max page 100
  @RequestParam @Min(1) @Max(100) int size,
  @SortDefault("createdAt") Pageable pageable
) { ... }

Force user use search/filter cho data deep. UX rationale: nobody actually wants page 1000.

Fix 3 — Slice (skip COUNT):

Slice<Project> findByStatus(ProjectStatus status, Pageable pageable);

Improve only marginally — main cost vẫn là OFFSET. Helps cho mobile UI nhưng không giải quyết deep page.

Fix 4 — Keyset pagination (best long-term):

public interface ProjectRepository extends JpaRepository<Project, Long> {
  Window<Project> scrollOrderByCreatedAtDesc(ScrollPosition position, Limit limit);
}

@GetMapping("/projects")
public ScrollResult<ProjectDto> list(
  @RequestParam(required = false) Instant cursor,
  @RequestParam(defaultValue = "20") @Max(100) int size
) {
  ScrollPosition pos = cursor == null
      ? ScrollPosition.offset()
      : ScrollPosition.forward(Map.of("createdAt", cursor));

  Window<Project> w = repo.scrollOrderByCreatedAtDesc(pos, Limit.of(size));
  return new ScrollResult<>(
      w.getContent().stream().map(ProjectDto::from).toList(),
      w.hasNext() ? ((Project) w.getContent().get(w.size()-1)).getCreatedAt() : null
  );
}

SQL keyset:

-- First page (no cursor)
SELECT * FROM projects ORDER BY created_at DESC LIMIT 20;

-- Page 2 (cursor = lastRow.createdAt)
SELECT * FROM projects WHERE created_at < ? ORDER BY created_at DESC LIMIT 20;

-- Page 1000 (cursor = ...)
SELECT * FROM projects WHERE created_at < ? ORDER BY created_at DESC LIMIT 20;
-- ALL pages: O(log N) — index lookup, không scan offset

Performance:

  • Page 1: 5ms
  • Page 100: 5ms
  • Page 1000: 5ms — constant!

Trade-off keyset:

  • Cons: không jump page direct. UX shift sang "Load more" button.
  • Pros: constant performance, stable với insert (no shift), real-time friendly.

Recommend:

  • Page (offset) admin: max 100 page enforced. Acceptable for small data.
  • Keyset cho user-facing: news feed, activity, dashboard.
  • Combo: offset cho first 10 page, keyset cho deeper. Hybrid pattern.
Q3
Đoạn `@Query` với projection + Pageable. Có 2 vấn đề performance. Liệt kê + fix.
@Query("""
  SELECT p FROM Project p
  LEFT JOIN p.tasks t
  WHERE p.status = :status
  GROUP BY p.id
  """)
Page<Project> findByStatus(@Param("status") ProjectStatus status, Pageable pageable);
  1. Load full entity thay DTO projection: select all 10+ columns of Project. Mapping overhead. Memory wasted.

    Fix:

    public record ProjectSummary(Long id, String name, ProjectStatus status, long taskCount) {}
    
    @Query("""
      SELECT new com.olhub.dto.ProjectSummary(p.id, p.name, p.status, COUNT(t.id))
      FROM Project p LEFT JOIN p.tasks t
      WHERE p.status = :status
      GROUP BY p.id, p.name, p.status
      """)
    Page<ProjectSummary> findSummariesByStatus(@Param("status") ProjectStatus status, Pageable pageable);
    Select 4 fields directly. Map to DTO at SQL level. ~50% less network data.
  2. Default countQuery: JOIN tasks unnecessary cho count:
    -- Default countQuery (Spring Data auto-derive)
    SELECT COUNT(p)
    FROM Project p LEFT JOIN p.tasks t      -- JOIN tasks just for count!
    WHERE p.status = :status
    JOIN slow cho count — count chỉ cần Project rows.

    Fix: custom countQuery:

    @Query(value = """
      SELECT new com.olhub.dto.ProjectSummary(p.id, p.name, p.status, COUNT(t.id))
      FROM Project p LEFT JOIN p.tasks t
      WHERE p.status = :status
      GROUP BY p.id, p.name, p.status
      """,
      countQuery = """
      SELECT COUNT(p) FROM Project p WHERE p.status = :status
      """)
    Page<ProjectSummary> findSummariesByStatus(@Param("status") ProjectStatus status, Pageable pageable);
    Count query no JOIN — fast.

SQL generated correct:

-- Data query
SELECT p.id, p.name, p.status, COUNT(t.id) AS task_count
FROM projects p LEFT JOIN tasks t ON t.project_id = p.id
WHERE p.status = ?
GROUP BY p.id, p.name, p.status
ORDER BY p.created_at DESC
LIMIT 20 OFFSET 0;

-- Count query (custom)
SELECT COUNT(p.id)
FROM projects p
WHERE p.status = ?;
-- No JOIN — fast cho big table

Bonus optimization:

  • Index hỗ trợ filter + sort:
    CREATE INDEX idx_projects_status_created
    ON projects(status, created_at DESC);
    Composite index. Postgres dùng index scan cho cả WHERE + ORDER BY.
  • Cache count cho hot filter: populate Redis với COUNT(*) per status, refresh every 5min. Trade staleness for performance.

Performance benchmark (1M project, 10 task/project):

  • Original (full entity, default count): 2-3s.
  • DTO projection + custom count: 50-100ms.
  • + Composite index: 20-30ms.

10-100x speedup cho list endpoint. Worth refactor effort.

Q4
User request `?sort=password,desc`. Risk security? Cách prevent?

Risk security:

  1. Information leak: sort by password reveal length distribution. Combined với count → estimate weak passwords.
  2. Sort by sensitive column: SSN, credit card, salary — leak through observation timing/order.
  3. Slow column DOS: sort by computed/large column (TEXT, JSONB) → expensive query → DOS.
  4. Internal column expose: ?sort=internal_admin_flag reveal table structure.

Fix — whitelist allowed sort columns:

@RestController
public class ProjectController {

  private static final Set<String> ALLOWED_SORT = Set.of("name", "status", "createdAt", "priority");

  @GetMapping("/projects")
  public Page<ProjectDto> list(
      @RequestParam(required = false) ProjectStatus status,
      @PageableDefault(size = 20) @SortDefault("createdAt") Pageable pageable
  ) {
      validateSort(pageable.getSort());
      return service.list(status, pageable);
  }

  private void validateSort(Sort sort) {
      sort.forEach(order -> {
          if (!ALLOWED_SORT.contains(order.getProperty())) {
              throw new ValidationException(
                  "Sort by '" + order.getProperty() + "' not allowed. " +
                  "Allowed: " + String.join(", ", ALLOWED_SORT)
              );
          }
      });
  }
}

Cleaner — custom Pageable resolver:

@Component
public class SafePageableArgumentResolver extends PageableHandlerMethodArgumentResolver {

  private static final Map<String, Set<String>> ALLOWED_PER_ENTITY = Map.of(
      "/projects", Set.of("name", "status", "createdAt", "priority"),
      "/tasks", Set.of("title", "status", "priority", "dueDate")
  );

  @Override
  public Pageable resolveArgument(MethodParameter parameter, ModelAndViewContainer mav,
                                   NativeWebRequest webRequest, WebDataBinderFactory binder) {
      Pageable pageable = super.resolveArgument(parameter, mav, webRequest, binder);
      Sort filtered = filterAllowed(pageable.getSort(),
          (String) webRequest.getNativeRequest(HttpServletRequest.class).getRequestURI());
      return PageRequest.of(pageable.getPageNumber(), pageable.getPageSize(), filtered);
  }
}

Alternative — DTO field projection guard:

Project DTO không expose `password` field → entity sort by password vẫn select password vào response → leak.

Layer defense:

  1. DTO không có sensitive field.
  2. Sort whitelist còn restrict đi.

Test security:

@Test
void invalidSortField_returns400() {
  mockMvc.perform(get("/projects?sort=password,desc"))
      .andExpect(status().isBadRequest())
      .andExpect(jsonPath("$.title").value("Validation failed"));
}

Pattern enterprise:

  • API gateway / BFF (Backend-for-Frontend) layer validate sort param.
  • Internal service trust pre-validated input.
  • Audit log: log every sort/filter param for compliance.
Q5
App có 100M activity rows. UI realtime feed. Choose pagination approach + implement.

Choose: Keyset pagination với Window/ScrollPosition.

Vì sao không pick alternatives:

  • Page: COUNT(*) trên 100M rows = 5-30s. Unusable.
  • Slice: bỏ COUNT nhưng OFFSET vẫn O(N). Page 1000 với offset 20000 vẫn slow.
  • Keyset: O(log N) mọi page. Constant performance. Stable với insert (realtime feed cần).

Implementation:

// Entity
@Entity
public class Activity {
  @Id @GeneratedValue Long id;

  @Column(nullable = false)
  private Instant createdAt;

  @Column(nullable = false, length = 50)
  private String type;

  @Column(columnDefinition = "TEXT")
  private String payload;
}

// Repository
public interface ActivityRepository extends JpaRepository<Activity, Long> {
  Window<Activity> findFirstByOrderByCreatedAtDescIdDesc(Limit limit, ScrollPosition position);
}

// Service
@Service
@Transactional(readOnly = true)
public class ActivityService {

  private final ActivityRepository repo;

  public ScrollResult<ActivityDto> feed(ActivityCursor cursor, int size) {
      ScrollPosition position = cursor == null
          ? ScrollPosition.offset()
          : ScrollPosition.forward(Map.of(
              "createdAt", cursor.createdAt(),
              "id", cursor.id()
          ));

      Window<Activity> window = repo.findFirstByOrderByCreatedAtDescIdDesc(
          Limit.of(size + 1),
          position
      );

      boolean hasNext = window.size() > size;
      List<Activity> items = hasNext ? window.getContent().subList(0, size) : window.getContent();

      ActivityCursor nextCursor = hasNext
          ? new ActivityCursor(items.get(items.size() - 1).getCreatedAt(), items.get(items.size() - 1).getId())
          : null;

      return new ScrollResult<>(
          items.stream().map(ActivityDto::from).toList(),
          nextCursor
      );
  }
}

public record ActivityCursor(Instant createdAt, Long id) {}
public record ScrollResult<T>(List<T> items, ActivityCursor nextCursor) {}

// Controller
@RestController
@RequestMapping("/api/activities")
public class ActivityController {

  @GetMapping
  public ScrollResult<ActivityDto> feed(
      @RequestParam(required = false) Instant cursorCreatedAt,
      @RequestParam(required = false) Long cursorId,
      @RequestParam(defaultValue = "20") @Min(1) @Max(100) int size
  ) {
      ActivityCursor cursor = (cursorCreatedAt != null && cursorId != null)
          ? new ActivityCursor(cursorCreatedAt, cursorId)
          : null;
      return service.feed(cursor, size);
  }
}

SQL generated:

-- First request (no cursor)
SELECT * FROM activities
ORDER BY created_at DESC, id DESC
LIMIT 21;

-- Subsequent (cursor = (createdAt='2026-04-15 10:00:00', id=12345))
SELECT * FROM activities
WHERE (created_at, id) < ('2026-04-15 10:00:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 21;
-- Composite cursor handle ties (multiple rows same createdAt)

Index critical:

-- Index match composite cursor
CREATE INDEX idx_activities_created_id ON activities(created_at DESC, id DESC);

-- Postgres index scan on (created_at, id) → O(log N) lookup

UI workflow:

1. GET /activities → return 20 items + nextCursor
2. User scrolls down, app load more
3. GET /activities?cursorCreatedAt=...&cursorId=... → next 20
4. Repeat

Performance:

  • Any cursor position: ~5ms (index lookup).
  • 100M row → 5M page → all sub-second.

Caveat:

  • No "page 100" jump — UX must adapt to "load more" pattern.
  • Cursor format internal — version it (Base64 encode JSON cho future change).
  • Test with realistic data volume — index quality matters.
Q6
Spring Boot config `spring.data.web.pageable.max-page-size: 100`. User request `?size=10000`. Response? Cách verify?

Spring silently cap to 100. No exception thrown.

Behavior:

Request: GET /projects?size=10000
↓
PageableHandlerMethodArgumentResolver
↓
Detect size > max-page-size
↓
Set pageable.size = 100 (cap)
↓
Continue normal flow

Response:
{
"content": [...],          // 100 items (not 10000)
"size": 100,                // capped
"number": 0,
...
}

Vì sao silent cap (default behavior):

  • UX: client get reasonable response thay error.
  • Backwards compat: old client với hardcoded high size still work.

Vấn đề: client không biết sizes capped. Pagination logic (`size × pageNum = expected total seen`) sai.

Cách verify in test:

@Test
void requestOversizedPage_isCappedTo100() throws Exception {
  mockMvc.perform(get("/projects?size=10000"))
      .andExpect(status().isOk())
      .andExpect(jsonPath("$.size").value(100));    // verify cap
}

Alternative — explicit error (preferred for API explicit):

@RestController
public class ProjectController {

  @GetMapping("/projects")
  public Page<ProjectDto> list(
      @RequestParam(defaultValue = "0") @Min(0) int page,
      @RequestParam(defaultValue = "20") @Min(1) @Max(100) int size,
      @RequestParam(defaultValue = "createdAt,desc") String sort
  ) {
      // @Max validation throw exception → 400 Bad Request
      Pageable pageable = PageRequest.of(page, size, parseSort(sort));
      return service.list(pageable).map(ProjectDto::from);
  }
}

@Validated
public class ProjectController { ... }

Now request `?size=10000` → 400 with violation message. Client knows explicitly.

Caveat — hybrid:

spring:
data:
  web:
    pageable:
      max-page-size: 100              # safety net global
      default-page-size: 20

# + manual @Max(100) per endpoint
# Both: explicit validation + safety net

Verify config active:

# Actuator endpoint
GET /actuator/configprops?prefix=spring.data.web

{
"prefix": "spring.data.web",
"properties": {
  "pageable.max-page-size": 100,
  "pageable.default-page-size": 20,
  ...
}
}

Pattern enterprise:

  • Global max-page-size = safety net (100).
  • Per-endpoint @Max(50) cho hot path (smaller).
  • API documentation explicit: "max page size: 100, default: 20".
  • Rate limiting per-endpoint thay phase max-page extreme.

Bài tiếp theo: Mini-challenge — TaskFlow v2 Postgres + JPA capstone

Bài này có giúp bạn hiểu bản chất không?

Bình luận (0)

Đang tải...