藍綠部署與金絲雀部署：現代化系統部署策略解析

🌏 Read the English version

在現代化的軟體開發中，部署策略直接影響系統可用性與用戶體驗。傳統的停機部署（maintenance window）已無法滿足 24/7 服務的需求。藍綠部署（Blue-Green Deployment）與金絲雀部署（Canary Deployment）提供了零停機部署的解決方案，但如何實際操作？本文將深入技術細節，提供完整的實作指引。

Table of Contents

為什麼需要零停機部署？

業務需求： – 全球化服務無法找到「離峰時段」 – SLA 承諾 99.9% 可用性（每月僅 43 分鐘停機預算） – 競爭對手不會等你維護完成

技術挑戰： – 如何在不中斷服務的情況下更新程式碼？ – 如何處理資料庫 schema 變更？ – 出問題時如何快速回滾？

藍綠部署（Blue-Green Deployment）

原理與架構

藍綠部署透過維護兩個完全相同的生產環境來實現零停機：

                   Load Balancer
                        │
        ┌───────────────┼───────────────┐
        │               │               │
    Blue (v1.0)    Green (v1.1)    Database
    Currently      New Version     (Shared)
    Active         Standby

核心概念： 1. Blue 環境：當前正在服務的版本 2. Green 環境：新版本部署在此，完成測試後待命 3. 切換機制：Load Balancer 瞬間將流量從 Blue 切到 Green 4. 回滾能力：問題發生時立即切回 Blue

Kubernetes 實作範例

1. 使用 Service Label Selector 切換

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: myapp
        image: myapp:v1.0
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10

# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: myapp
        image: myapp:v1.1
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

# service.yaml (切換的關鍵)
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp
    version: blue  # 切換到 green 即完成部署
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

部署與切換流程：

# 1. 部署 Green 環境（新版本）
kubectl apply -f green-deployment.yaml

# 2. 等待 Green 環境就緒
kubectl rollout status deployment/myapp-green

# 3. 驗證 Green 環境健康狀態
kubectl get pods -l version=green
kubectl logs -l version=green --tail=50

# 4. 執行冒煙測試（Smoke Test）
kubectl port-forward deployment/myapp-green 8080:8080
curl http://localhost:8080/health
curl http://localhost:8080/api/test

# 5. 切換流量到 Green（關鍵步驟）
kubectl patch service myapp-service -p '{"spec":{"selector":{"version":"green"}}}'

# 6. 監控切換後的指標（5-10 分鐘）
kubectl top pods -l version=green
# 觀察錯誤率、延遲、CPU/Memory

# 7. 確認無誤後，刪除 Blue 環境
kubectl delete deployment myapp-blue

# 如需回滾：立即執行
kubectl patch service myapp-service -p '{"spec":{"selector":{"version":"blue"}}}'

Nginx 實作範例

使用 Nginx 作為 Load Balancer 進行藍綠切換：

# /etc/nginx/conf.d/myapp.conf

upstream backend {
    # 藍綠切換：註解/取消註解即可切換
    # Blue 環境（當前生產）
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;

    # Green 環境（新版本，待切換）
    # server 10.0.2.10:8080;
    # server 10.0.2.11:8080;
    # server 10.0.2.12:8080;
}

server {
    listen 80;
    server_name myapp.example.com;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # 健康檢查
        proxy_next_upstream error timeout http_502 http_503 http_504;
        proxy_connect_timeout 5s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;
    }

    location /health {
        access_log off;
        proxy_pass http://backend;
    }
}

切換腳本：

#!/bin/bash
# blue-green-switch.sh

set -e

NGINX_CONF="/etc/nginx/conf.d/myapp.conf"
BACKUP_CONF="/tmp/myapp.conf.backup"

# 備份當前設定
cp $NGINX_CONF $BACKUP_CONF

# 檢查 Green 環境健康狀態
for server in 10.0.2.10 10.0.2.11 10.0.2.12; do
    if ! curl -f -s http://$server:8080/health > /dev/null; then
        echo "❌ Green server $server is not healthy"
        exit 1
    fi
done

echo "✅ All Green servers are healthy"

# 切換到 Green（註解 Blue，啟用 Green）
sed -i '/10.0.1/s/^/# /' $NGINX_CONF
sed -i '/10.0.2/s/^# //' $NGINX_CONF

# 測試 Nginx 設定
if nginx -t; then
    echo "✅ Nginx configuration is valid"
    # 重新載入 Nginx（無停機）
    nginx -s reload
    echo "✅ Switched to Green environment"
else
    echo "❌ Nginx configuration test failed, rolling back"
    cp $BACKUP_CONF $NGINX_CONF
    exit 1
fi

# 監控 5 分鐘
echo "Monitoring for 5 minutes..."
sleep 300

# 如果一切正常，可以手動清理 Blue 環境

AWS CodeDeploy 實作範例

AppSpec 檔案（藍綠部署）：

# appspec.yml
version: 0.0
os: linux
files:
  - source: /
    destination: /var/www/myapp
hooks:
  BeforeInstall:
    - location: scripts/before_install.sh
      timeout: 300
      runas: root
  AfterInstall:
    - location: scripts/after_install.sh
      timeout: 300
      runas: root
  ApplicationStart:
    - location: scripts/start_application.sh
      timeout: 300
      runas: root
  ValidateService:
    - location: scripts/validate_service.sh
      timeout: 300
      runas: root

驗證腳本：

#!/bin/bash
# scripts/validate_service.sh

# 等待應用程式啟動
sleep 10

# 健康檢查
HEALTH_CHECK_URL="http://localhost:8080/health"
MAX_ATTEMPTS=30

for i in $(seq 1 $MAX_ATTEMPTS); do
    if curl -f -s $HEALTH_CHECK_URL | grep -q "healthy"; then
        echo "✅ Application is healthy"
        exit 0
    fi
    echo "⏳ Waiting for application to be healthy (attempt $i/$MAX_ATTEMPTS)"
    sleep 10
done

echo "❌ Application failed health check"
exit 1

藍綠部署的實務考量

1. 資料庫 Schema 變更

藍綠部署最大的挑戰是資料庫共享。Schema 變更必須向後相容：

-- ❌ 錯誤做法：直接刪除欄位（會導致 Blue 環境報錯）
ALTER TABLE users DROP COLUMN legacy_field;

-- ✅ 正確做法：分階段遷移
-- Phase 1: 新增欄位（Blue/Green 都能運作）
ALTER TABLE users ADD COLUMN new_field VARCHAR(255);

-- Phase 2: 資料遷移
UPDATE users SET new_field = CONCAT(first_name, ' ', last_name);

-- Phase 3: 應用程式切換到 Green（使用 new_field）

-- Phase 4: 下次部署才刪除舊欄位
ALTER TABLE users DROP COLUMN legacy_field;

2. Session 處理

# 使用 Redis 集中式 Session 儲存
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  SESSION_STORE: "redis"
  REDIS_HOST: "redis-cluster.default.svc.cluster.local"
  REDIS_PORT: "6379"

3. 快取失效策略

# 藍綠切換時清除快取
import redis

def invalidate_cache_on_deployment():
    r = redis.Redis(host='redis-host', port=6379)
    # 使用版本號作為 cache key prefix
    r.delete('cache:v1.0:*')
    print("Cache invalidated for old version")

4. 監控指標

# 切換後必須監控的指標
# - HTTP 錯誤率 (4xx, 5xx)
# - 回應時間 (p50, p95, p99)
# - 資料庫連線數
# - CPU/Memory 使用率

# Prometheus 查詢範例
rate(http_requests_total{status=~"5.."}[5m])  # 5xx 錯誤率
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))  # p95 延遲

金絲雀部署（Canary Deployment）

原理與流量控制

金絲雀部署透過逐步增加流量比例來降低風險：

                Load Balancer (流量分配)
                        │
        ┌───────────────┼───────────────┐
        │               │               │
    Stable (95%)    Canary (5%)     Database
    v1.0            v1.1            (Shared)

    ↓ 觀察指標正常後逐步增加 Canary 流量

    Stable (50%)    Canary (50%)

    ↓ 最終完全切換

    Stable (0%)     Canary (100%)

Kubernetes + Istio 實作範例

1. Istio VirtualService 配置流量分配：

# canary-virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-vs
spec:
  hosts:
  - myapp.example.com
  http:
  - match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination:
        host: myapp-service
        subset: canary
  - route:
    - destination:
        host: myapp-service
        subset: stable
      weight: 95
    - destination:
        host: myapp-service
        subset: canary
      weight: 5  # 初始 5% 流量到 Canary

# canary-destinationrule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: myapp-dr
spec:
  host: myapp-service
  subsets:
  - name: stable
    labels:
      version: stable
  - name: canary
    labels:
      version: canary

2. 部署流程與流量調整：

# 1. 部署 Canary 版本（5% 流量）
kubectl apply -f canary-deployment.yaml
kubectl apply -f canary-virtualservice.yaml

# 2. 監控 Canary 指標（30 分鐘）
istioctl dashboard prometheus
# 觀察：錯誤率、延遲、資源使用

# 3. 逐步增加流量（如果指標正常）
# 5% → 10% → 25% → 50% → 100%
kubectl patch virtualservice myapp-vs --type json 
  -p '[{"op":"replace","path":"/spec/http/1/route/0/weight","value":90},
       {"op":"replace","path":"/spec/http/1/route/1/weight","value":10}]'

# 每次調整後觀察 15-30 分鐘

# 4. 完全切換到 Canary
kubectl patch virtualservice myapp-vs --type json 
  -p '[{"op":"replace","path":"/spec/http/1/route/0/weight","value":0},
       {"op":"replace","path":"/spec/http/1/route/1/weight","value":100}]'

# 5. 清理舊版本
kubectl delete deployment myapp-stable

# 如需回滾：立即將流量切回 stable
kubectl patch virtualservice myapp-vs --type json 
  -p '[{"op":"replace","path":"/spec/http/1/route/0/weight","value":100},
       {"op":"replace","path":"/spec/http/1/route/1/weight","value":0}]'

Nginx 實作範例（加權輪詢）

upstream backend {
    # Stable 版本（95% 流量，19 個實例）
    server 10.0.1.10:8080 weight=19;

    # Canary 版本（5% 流量，1 個實例）
    server 10.0.2.10:8080 weight=1;
}

server {
    listen 80;
    server_name myapp.example.com;

    location / {
        proxy_pass http://backend;

        # 保留 Session Affinity（如需要）
        # ip_hash;  # 或使用 sticky session
    }
}

AWS CodeDeploy 金絲雀部署

部署配置檔：

{
  "deploymentConfigName": "CodeDeployDefault.Canary10Percent5Minutes",
  "computePlatform": "Server",
  "trafficRoutingConfig": {
    "type": "TimeBasedCanary",
    "timeBasedCanary": {
      "canaryPercentage": 10,
      "canaryInterval": 5
    }
  }
}

自訂部署配置：

# 建立自訂 Canary 配置：每 10 分鐘增加 20% 流量
aws deploy create-deployment-config 
  --deployment-config-name Custom.Canary20Percent10Minutes 
  --compute-platform Server 
  --traffic-routing-config '
{
  "type": "TimeBasedCanary",
  "timeBasedCanary": {
    "canaryPercentage": 20,
    "canaryInterval": 10
  }
}'

自動回滾設定

Kubernetes + Prometheus + Flagger 自動化：

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99  # 成功率低於 99% 自動回滾
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500  # p99 延遲超過 500ms 自動回滾
      interval: 1m
    webhooks:
    - name: load-test
      url: http://loadtester.default/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary:8080/"

監控與告警腳本：

#!/bin/bash
# canary-monitor.sh

CANARY_ERROR_RATE=$(curl -s 'http://prometheus:9090/api/v1/query?query=rate(http_requests_total{version="canary",status=~"5.."}[5m])' | jq -r '.data.result[0].value[1]')

THRESHOLD=0.01  # 1% 錯誤率

if (( $(echo "$CANARY_ERROR_RATE > $THRESHOLD" | bc -l) )); then
    echo "❌ Canary error rate too high: $CANARY_ERROR_RATE"
    echo "Rolling back to stable..."

    # 自動回滾
    kubectl patch virtualservice myapp-vs --type json 
      -p '[{"op":"replace","path":"/spec/http/1/route/0/weight","value":100},
           {"op":"replace","path":"/spec/http/1/route/1/weight","value":0}]'

    # 發送告警
    curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL 
      -d "{"text":"⚠️ Canary deployment auto-rollback triggered"}"

    exit 1
fi

echo "✅ Canary metrics are healthy: error_rate=$CANARY_ERROR_RATE"

藍綠 vs 金絲雀：如何選擇？

考量因素	藍綠部署	金絲雀部署
風險控制	一次性切換，風險較高	逐步放量，風險最低
基礎設施成本	需雙倍資源（100% + 100%）	資源較低（100% + 5-20%）
回滾速度	瞬間（切換 LB）	需逐步減少流量
測試複雜度	可完整測試 Green 環境	Canary 流量少，難測試邊緣案例
監控要求	切換後集中監控	需持續監控 Canary 指標
適用場景	重大版本升級、資料庫遷移	日常功能迭代、A/B 測試

建議策略：

日常部署：Canary（5% → 10% → 25% → 50% → 100%）
大版本升級：Blue-Green（完整測試後一次切換）
緊急修復：Blue-Green（快速切換，快速回滾）
實驗性功能：Canary + Feature Flag（特定用戶先測試）

實務檢查清單

部署前： – [ ] 資料庫 Schema 變更向後相容 – [ ] Session 使用集中式儲存（Redis/Memcached） – [ ] 靜態資源版本化（避免快取問題） – [ ] 健康檢查端點正常運作 – [ ] 監控與告警已設定

部署中： – [ ] 新版本通過冒煙測試 – [ ] 監控儀表板已開啟 – [ ] 團隊成員待命（可立即回滾） – [ ] 記錄部署時間與流量分配

部署後： – [ ] 觀察錯誤率、延遲、資源使用 – [ ] 檢查日誌有無異常 – [ ] 驗證關鍵業務流程 – [ ] 清理舊版本資源

總結

藍綠部署與金絲雀部署是零停機部署的兩大支柱：

藍綠部署： – 適合需要快速切換與回滾的場景 – 透過 Kubernetes Service、Nginx、AWS CodeDeploy 實現 – 關鍵在於資料庫相容性與健康檢查

金絲雀部署： – 適合逐步驗證新版本的場景 – 透過 Istio、Nginx 加權、Flagger 自動化實現 – 關鍵在於監控指標與自動回滾機制

兩者可結合使用：先用 Canary 驗證小流量，確認無誤後用 Blue-Green 快速完成剩餘切換。

下一步： – 建立 CI/CD Pipeline 整合部署策略 – 設定 Prometheus + Grafana 監控 – 實作自動化回滾機制 – 定期演練部署與回滾流程

藍綠部署與金絲雀部署：現代化系統部署策略解析

為什麼需要零停機部署？

藍綠部署（Blue-Green Deployment）

原理與架構

Kubernetes 實作範例

Nginx 實作範例

AWS CodeDeploy 實作範例

藍綠部署的實務考量

金絲雀部署（Canary Deployment）

原理與流量控制

Kubernetes + Istio 實作範例

Nginx 實作範例（加權輪詢）

AWS CodeDeploy 金絲雀部署

自動回滾設定

藍綠 vs 金絲雀：如何選擇？

實務檢查清單

總結

相關文章

Leave a Comment Cancel reply

為什麼需要零停機部署？

藍綠部署（Blue-Green Deployment）

原理與架構

Kubernetes 實作範例

Nginx 實作範例

AWS CodeDeploy 實作範例

藍綠部署的實務考量

金絲雀部署（Canary Deployment）

原理與流量控制

Kubernetes + Istio 實作範例

Nginx 實作範例（加權輪詢）

AWS CodeDeploy 金絲雀部署

自動回滾設定

藍綠 vs 金絲雀：如何選擇？

實務檢查清單

總結

相關文章

Related posts:

Leave a Comment Cancel reply