Post

Belajar SRE #18: Overload Handling

Pelajari strategi overload handling: load shedding, rate limiting, dan graceful degradation untuk menjaga sistem tetap berfungsi saat traffic spike.

Belajar SRE #18: Overload Handling

Overload handling adalah kemampuan sistem untuk tetap berfungsi (meskipun dalam kapasitas terbatas) ketika menerima traffic yang melebihi kapasitasnya. Google SRE Book Chapter 21 menegaskan bahwa sistem yang baik harus bisa menolak sebagian request dengan graceful daripada collapse sepenuhnya. Dalam dunia nyata, overload bisa terjadi karena flash sale, viral moment, DDoS attack, atau cascading failure. Artikel ini membahas tiga strategi utama: load shedding, rate limiting/throttling, dan graceful degradation.

Jika Anda belum membaca artikel sebelumnya, mulai dari Advanced SRE: On-Call Automation & Runbook.

Prerequisites

Mengapa Overload Handling Penting?

Tanpa overload handling, sistem mengalami cascading collapse:

TANPA Overload Handling (spike 5000 req/s):

  • Database MAX CONN EXHAUSTED
  • Timeout → Retry storm → More load
  • → TOTAL COLLAPSE
  • RESULT: 0% requests succeed (total outage)

DENGAN Overload Handling:

  • API Gateway RATE LIMIT 2000 req/s
  • Order Service LOAD SHED by priority
  • 2000 req/s → processed successfully ✅
  • 3000 req/s → rejected with 429 (Too Many Requests)
  • RESULT: 40% requests succeed (2000/5000)

Lebih baik serve 40% dengan baik daripada 0% total collapse

Tiga Strategi Overload Handling

StrategyApa yang DilakukanKapan DigunakanContoh
Load SheddingMenolak excess requests berdasarkan prioritySaat system mendekati capacity limitDrop low-priority requests
Rate LimitingMembatasi jumlah requests per time windowMencegah abuse dan protect backendMax 100 req/s per user
Graceful DegradationMengurangi fitur untuk mempertahankan coreSaat load tinggi tapi semua request pentingDisable recommendations

Load Shedding

Priority-Based Shedding

Priority-Based Shedding:

PriorityLevelServicesRule
1CriticalPayment, AuthALWAYS serve
2HighOrder, InventoryServe if load < 80%
3MediumSearch, BrowseServe if load < 60%
4LowAnalytics, RecoServe if load < 40%
  • Load 90%: Shed P4 + P3 → serve P1 + P2 only
  • Load 95%: Shed P4 + P3 + P2 → serve P1 only

CoDel (Controlled Delay):

  • Monitor queue latency, bukan queue length
  • Jika request sudah di queue > threshold → drop
  • Cocok untuk: real-time systems, user-facing APIs

Load Shedding Middleware (Go)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// load_shedder.go — Priority-based load shedding middleware
package middleware

import (
    "net/http"
    "sync/atomic"
)

type LoadShedder struct {
    maxConcurrent int64
    current       int64
    priorities    map[string]int
}

func NewLoadShedder(maxConcurrent int64) *LoadShedder {
    return &LoadShedder{
        maxConcurrent: maxConcurrent,
        priorities: map[string]int{
            "/api/payment":  1, // Critical — never shed
            "/api/auth":     1, // Critical — never shed
            "/api/order":    2, // High — shed at 80%
            "/api/search":   3, // Medium — shed at 60%
            "/api/recommend": 4, // Low — shed at 40%
        },
    }
}

func (ls *LoadShedder) Middleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        current := atomic.LoadInt64(&ls.current)
        loadPercent := float64(current) / float64(ls.maxConcurrent) * 100
        priority := ls.getPriority(r.URL.Path)

        if ls.shouldShed(loadPercent, priority) {
            w.Header().Set("Retry-After", "5")
            http.Error(w, "Service overloaded", http.StatusServiceUnavailable)
            return
        }

        atomic.AddInt64(&ls.current, 1)
        defer atomic.AddInt64(&ls.current, -1)
        next.ServeHTTP(w, r)
    })
}

func (ls *LoadShedder) shouldShed(loadPercent float64, priority int) bool {
    switch priority {
    case 1: return false              // Never shed critical
    case 2: return loadPercent > 80   // Shed high at 80%
    case 3: return loadPercent > 60   // Shed medium at 60%
    case 4: return loadPercent > 40   // Shed low at 40%
    default: return loadPercent > 50
    }
}

Rate Limiting & Throttling

Rate Limiting Algorithms

AlgorithmBurstSmoothingUse Case
Token BucketYesNoAPI rate limiting
Sliding WindowNoYesFair usage enforcement
Leaky BucketNoYesConstant output rate

Multi-Layer Rate Limiting

  1. Layer 1: CDN/WAF (CloudFront, AWS WAF)
    • Global rate limit: 10,000 req/s
    • Per-IP rate limit: 100 req/s
    • DDoS protection (automatic)
  2. Layer 2: API Gateway / Load Balancer
    • Per-API rate limit: 5,000 req/s
    • Per-user/tenant rate limit: 500 req/s
    • Burst allowance: 2x for 10 seconds
  3. Layer 3: Service Mesh (Istio/Envoy sidecar)
    • Per-service rate limit: 1,000 req/s
    • Circuit breaker: open at 50% error rate
    • Connection pool limits
  4. Layer 4: Application (middleware)
    • Per-endpoint rate limit
    • Priority-based load shedding
    • Adaptive throttling based on backend health
  5. Layer 5: Database/Backend
    • Connection pool limits
    • Query timeout
    • Read replica routing for read-heavy traffic

Envoy Rate Limit Configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: rate-limit-filter
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: GATEWAY
        listener:
          filterChain:
            filter:
              name: "envoy.filters.network.http_connection_manager"
              subFilter:
                name: "envoy.filters.http.router"
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
              token_bucket:
                max_tokens: 5000
                tokens_per_fill: 5000
                fill_interval: 1s

Graceful Degradation

Degradation Levels

LEVEL 0: NORMAL (load < 70%)

  • Semua fitur aktif

LEVEL 1: LIGHT DEGRADATION (load 70-85%)

  • Disable real-time recommendations → serve cached
  • Disable personalization → serve generic content
  • Core: payment, order, auth tetap full

LEVEL 2: MODERATE DEGRADATION (load 85-95%)

  • Disable search filters (only basic search)
  • Serve fully cached product pages
  • Reduce image quality
  • Core: payment, order, auth tetap full

LEVEL 3: HEAVY DEGRADATION (load > 95%)

  • Static product pages only
  • Queue-based ordering (async confirmation)
  • Simplified checkout flow
  • Core: payment processing tetap prioritas utama

Feature Flags untuk Degradation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# degradation-config.yaml
degradation_levels:
  level_0:
    trigger: "load < 70%"
    features:
      recommendations: true
      personalization: true
      full_search: true
      image_quality: "high"

  level_1:
    trigger: "load >= 70% AND load < 85%"
    features:
      recommendations: false
      personalization: false
      full_search: true
      image_quality: "high"

  level_2:
    trigger: "load >= 85% AND load < 95%"
    features:
      recommendations: false
      personalization: false
      full_search: false
      image_quality: "medium"

  level_3:
    trigger: "load >= 95%"
    features:
      recommendations: false
      personalization: false
      full_search: false
      image_quality: "low"
      async_ordering: true

Kubernetes-Native Overload Protection

HPA dengan Custom Metrics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: payment-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  minReplicas: 3
  maxReplicas: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 300
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"

Studi Kasus: TechStartup Indonesia

Konteks

TSI pada Q4 2021 mengadakan flash sale akhir tahun yang diproyeksikan menghasilkan 10-15x traffic spike. Flash sale sebelumnya menyebabkan total outage 45 menit dan revenue loss Rp 800 juta.

Apa yang Dilakukan

TSI mengimplementasikan multi-layer overload handling:

  1. CloudFront + WAF — Bot blocking, per-IP rate limit 200 req/s
  2. API Gateway Envoy Rate Limiting — Global 10K req/s, per-user 50 req/s
  3. Application-Level Priority Load Shedding — Payment always served, recommendations disabled saat overload
  4. Graceful Degradation via Feature Flags — Progressive feature reduction berdasarkan load level
  5. Pre-Scaling — Scale up 2 jam sebelum flash sale dimulai

Hasilnya: flash sale dengan peak 12,500 req/s (12.5x normal) berhasil tanpa downtime.

Metrics Improvement

MetricSebelumSesudahPerubahan
Peak traffic handled3,000 req/s8,500 req/s+183%
Downtime during sale45 min0 min-100%
Checkout success rate62%99.2%+37pp
Revenue during saleRp 800MRp 2.1B+163%
Customer complaints1,20085-93%
P1 incidents30-100%

Lessons Learned

Yang Berhasil:

  • Multi-layer protection (WAF → Gateway → App → Feature Flags) — setiap layer menangani jenis overload yang berbeda
  • Priority-based load shedding — checkout flow tetap 99.2% success rate meskipun total traffic melebihi capacity 56%
  • Pre-scaling 2 jam sebelum flash sale — menghilangkan cold start delay dari autoscaling
  • Proper HTTP status codes (429 + Retry-After header) — client-side backoff bekerja, mencegah retry storm

Yang Perlu Dihindari:

  • Rate limit per-IP terlalu ketat (50 req/s) — banyak corporate users di belakang NAT terdampak; dinaikkan ke 200 req/s
  • Tidak test load shedding dengan realistic traffic pattern — staging test hanya uniform load, production punya bursty pattern
  • Graceful degradation tanpa komunikasi ke user — tambahkan banner “Flash Sale Mode: some features temporarily simplified”

Best Practices

  • Implement multi-layer rate limiting — CDN, gateway, app, dan database level
  • Gunakan priority-based load shedding — critical requests (payment) harus selalu di-serve
  • Return proper HTTP status codes — 429 dan 503 dengan Retry-After header
  • Test overload handling regularly — load test sampai breaking point
  • Implement graceful degradation progressively — reduce features bertahap, bukan all-or-nothing
  • Pre-scale sebelum expected spike — flash sale, campaign launch → scale up sebelum event
  • Monitor shedding/limiting metrics — track berapa request yang di-reject

Selanjutnya

Artikel berikutnya: Advanced SRE: Data Integrity — pelajari strategi backup, recovery testing, dan data validation untuk melindungi data saat dan setelah incident.

Topik terkait yang bisa Anda eksplorasi:

  • Data Integrity — protecting data during and after incidents
  • Capacity Planning — sizing infrastructure untuk handle expected load
  • Reliability Patterns — circuit breaker dan bulkhead untuk isolasi failure

References


⬅️ Sebelumnya: Advanced SRE: On-Call Automation & Runbook

➡️ Selanjutnya: Advanced SRE: Data Integrity

This post is licensed under CC BY 4.0 by the author.