Ubuntu Server 自動更新完整指南:企業級策略、風險控制與故障恢復

🌏 Read the English version

Table of Contents

Ubuntu Server 自動更新完整指南:企業級策略、風險控制與故障恢復

在企業生產環境中,Ubuntu Server 的更新策略是系統管理員面臨的核心挑戰之一。自動更新能確保系統安全性,但不當配置可能導致服務中斷、相容性問題甚至系統無法啟動。本文將從企業 PRD 環境的角度,深入探討如何設計穩健的自動更新策略,包含完整的技術實作、風險控制與故障恢復機制。

核心問題:自動更新應該做到什麼程度?

企業環境的三難困境

需求 衝突點 風險
安全性 需要即時套用安全性更新 未修補的漏洞可能被攻擊
穩定性 更新可能破壞現有功能 服務中斷、相容性問題
可控性 自動化 vs 人工審查 失控的更新或延遲修補

決策框架:更新分級制度

資深系統管理員的共識是:並非所有更新都應該完全自動化。建議採用分級制度:

更新類型 自動化程度 理由 實作策略
Security Updates ✅ 完全自動 關鍵漏洞需即時修補 unattended-upgrades
Important Updates ⚠️ 條件自動 重要但非緊急 測試後自動/人工審核
Standard Updates ⚠️ 手動或排程 風險較高、非緊急 維護窗口執行
Kernel Updates ❌ 手動控制 需要重啟、風險最高 測試驗證後執行
Major Version Upgrade ❌ 嚴格手動 可能破壞整個系統 完整測試與備份

技術實作:unattended-upgrades 完整配置

安裝與基礎設定

# Ubuntu 22.04 預設已安裝,確認版本
apt-cache policy unattended-upgrades

# 若未安裝
sudo apt update
sudo apt install unattended-upgrades apt-listchanges

# 啟用自動更新
sudo dpkg-reconfigure -plow unattended-upgrades

核心配置檔:/etc/apt/apt.conf.d/50unattended-upgrades

這是企業級生產環境的完整配置範例,涵蓋所有關鍵設定:

// Ubuntu 22.04 LTS 企業級自動更新配置
// 目標:自動安裝安全性更新,保持系統穩定性

Unattended-Upgrade::Allowed-Origins {
    // 僅允許官方安全性更新
    "${distro_id}:${distro_codename}-security";
    
    // 可選:包含重要更新(需評估風險)
    // "${distro_id}:${distro_codename}-updates";
    
    // ESM 更新(Ubuntu Pro 訂閱)
    // "${distro_id}ESMApps:${distro_codename}-apps-security";
    // "${distro_id}ESM:${distro_codename}-infra-security";
};

// 黑名單:永不自動更新的套件
Unattended-Upgrade::Package-Blacklist {
    // 核心套件(需人工控制)
    "linux-image-*";
    "linux-headers-*";
    "linux-modules-*";
    
    // 關鍵服務(需測試驗證)
    "nginx";
    "apache2";
    "mysql-server*";
    "postgresql*";
    "docker*";
    "kubernetes*";
    
    // 自訂應用程式
    // "myapp-*";
};

// 白名單:優先更新的套件(覆蓋黑名單)
// Unattended-Upgrade::Package-Whitelist {
//     "openssl";
//     "libssl*";
// };

// 自動移除不再需要的依賴
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-New-Unused-Dependencies "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";

// 更新前後自動執行的命令
// Unattended-Upgrade::Pre-Install-Exec {
//     "/usr/local/bin/pre-update-snapshot.sh";
// };
// Unattended-Upgrade::Post-Install-Exec {
//     "/usr/local/bin/post-update-verify.sh";
// };

// 自動重啟設定(關鍵配置)
Unattended-Upgrade::Automatic-Reboot "false";
// 若必須自動重啟,設定時間窗口
// Unattended-Upgrade::Automatic-Reboot-Time "03:00";
// Unattended-Upgrade::Automatic-Reboot-WithUsers "false";

// 郵件通知設定
Unattended-Upgrade::Mail "sysadmin@example.com";
Unattended-Upgrade::MailReport "on-change";
// 選項:always, only-on-error, on-change

// 下載與安裝限制
Unattended-Upgrade::Download-Limit "70"; // KB/s
// Acquire::http::Dl-Limit "70"; // 全域下載限制

// 保留舊版本套件(便於回滾)
Unattended-Upgrade::Keep-Debs-After-Install "true";

// 詳細日誌記錄
Unattended-Upgrade::Verbose "true";
Unattended-Upgrade::Debug "false";

// 更新失敗時的處理
Unattended-Upgrade::OnlyOnACPower "false";
Unattended-Upgrade::Skip-Updates-On-Metered-Connection "true";

// dpkg 選項:衝突處理策略
Dpkg::Options {
    "--force-confdef";  // 使用預設選項
    "--force-confold";  // 保留舊配置檔
};

// 系統關機時不中斷更新
Unattended-Upgrade::InstallOnShutdown "false";

// SyslogEnable 與 SyslogFacility 已棄用,改用 systemd journal

自動更新排程:/etc/apt/apt.conf.d/20auto-upgrades

// 控制 apt-daily.timer 與 apt-daily-upgrade.timer 的行為

// 每天更新套件清單(1 = 每天執行)
APT::Periodic::Update-Package-Lists "1";

// 每天下載可升級的套件(不安裝)
APT::Periodic::Download-Upgradeable-Packages "1";

// 執行 unattended-upgrade(1 = 每天執行)
APT::Periodic::Unattended-Upgrade "1";

// 自動清理(單位:天)
APT::Periodic::AutocleanInterval "7";

// 詳細日誌
APT::Periodic::Verbose "2";

systemd Timer 時間控制

預設的 systemd timer 可能在業務高峰時段執行,需要調整:

# 查看當前排程
systemctl status apt-daily.timer
systemctl status apt-daily-upgrade.timer

# 查看下次執行時間
systemctl list-timers apt-daily*

# 客製化執行時間(建立 override)
sudo systemctl edit apt-daily.timer

在編輯器中加入:

[Timer]
# 清除預設時間
OnCalendar=
# 設定為每天凌晨 2:00 執行(避開業務高峰)
OnCalendar=02:00
# 隨機延遲 0-30 分鐘(避免多台伺服器同時更新)
RandomizedDelaySec=30min

同樣調整 upgrade timer:

sudo systemctl edit apt-daily-upgrade.timer
[Timer]
OnCalendar=
OnCalendar=03:00
RandomizedDelaySec=30min
# 重新載入並驗證
sudo systemctl daemon-reload
systemctl list-timers apt-daily*

升級失敗的多層次恢復機制

這是企業環境最關鍵的部分。升級失敗可能導致:

  • 系統無法啟動(kernel panic、initramfs 問題)
  • 服務無法運作(相依性衝突、配置不相容)
  • 效能降低(新版本 bug、資源消耗增加)

第一層防護:更新前自動快照

LVM 快照(推薦)

若系統使用 LVM,可在更新前自動建立快照:

#!/bin/bash
# /usr/local/bin/pre-update-snapshot.sh

SNAPSHOT_NAME="pre-update-$(date +%Y%m%d-%H%M%S)"
VG_NAME="ubuntu-vg"
LV_NAME="ubuntu-lv"
SNAPSHOT_SIZE="10G"

# 建立 LVM 快照
lvcreate -L ${SNAPSHOT_SIZE} -s -n ${SNAPSHOT_NAME} /dev/${VG_NAME}/${LV_NAME}

if [ $? -eq 0 ]; then
    echo "✅ LVM snapshot created: ${SNAPSHOT_NAME}" | logger -t pre-update
    # 記錄快照資訊
    echo "${SNAPSHOT_NAME}" > /var/log/last-update-snapshot.txt
else
    echo "❌ Failed to create LVM snapshot" | logger -t pre-update
    exit 1
fi

# 保留最近 3 個快照,刪除舊的
SNAPSHOTS=$(lvs --noheadings -o lv_name ${VG_NAME} | grep "pre-update-" | sort -r | tail -n +4)
for snap in ${SNAPSHOTS}; do
    lvremove -f /dev/${VG_NAME}/${snap}
    echo "🗑️  Removed old snapshot: ${snap}" | logger -t pre-update
done
# 設定執行權限
sudo chmod +x /usr/local/bin/pre-update-snapshot.sh

# 在 50unattended-upgrades 中啟用
# Unattended-Upgrade::Pre-Install-Exec {
#     "/usr/local/bin/pre-update-snapshot.sh";
# };

快照回滾流程

若更新後系統異常,從救援模式回滾:

# 1. 重啟進入 GRUB 選單,選擇 "Advanced options" > "Recovery mode"

# 2. 選擇 "root - Drop to root shell prompt"

# 3. 重新掛載根目錄為讀寫
mount -o remount,rw /

# 4. 查看可用快照
lvs ubuntu-vg

# 5. 合併快照(回滾)
lvconvert --merge /dev/ubuntu-vg/pre-update-20250120-020000

# 6. 重新啟動
reboot

第二層防護:套件層級回滾

保留舊版本套件

# 在 50unattended-upgrades 中已設定
Unattended-Upgrade::Keep-Debs-After-Install "true";

# 舊版本套件儲存位置
/var/cache/apt/archives/

# 查看已安裝套件的歷史版本
ls -lh /var/cache/apt/archives/ | grep nginx

降級特定套件

# 查看套件安裝歷史
grep "install|upgrade" /var/log/dpkg.log | tail -50
grep "install|upgrade" /var/log/apt/history.log | tail -50

# 查看可用的舊版本
apt-cache policy nginx

# 降級至特定版本
sudo apt install nginx=1.18.0-6ubuntu14.3

# 鎖定套件版本,防止再次升級
sudo apt-mark hold nginx

# 查看鎖定的套件
apt-mark showhold

# 解除鎖定
sudo apt-mark unhold nginx

第三層防護:服務健康檢查

更新後自動驗證關鍵服務:

#!/bin/bash
# /usr/local/bin/post-update-verify.sh

LOG_FILE="/var/log/post-update-verify.log"
ALERT_EMAIL="sysadmin@example.com"

echo "========== Post-Update Verification: $(date) ==========" >> ${LOG_FILE}

# 檢查關鍵服務狀態
SERVICES=("nginx" "mysql" "postgresql" "docker" "ssh")
FAILED_SERVICES=()

for service in "${SERVICES[@]}"; do
    if systemctl is-active --quiet ${service}; then
        echo "✅ ${service}: active" >> ${LOG_FILE}
    else
        echo "❌ ${service}: FAILED" >> ${LOG_FILE}
        FAILED_SERVICES+=("${service}")
    fi
done

# 檢查磁碟空間
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ ${DISK_USAGE} -gt 90 ]; then
    echo "⚠️  Disk usage: ${DISK_USAGE}% (WARNING)" >> ${LOG_FILE}
fi

# 檢查記憶體
MEM_AVAILABLE=$(free -m | awk 'NR==2 {print $7}')
if [ ${MEM_AVAILABLE} -lt 500 ]; then
    echo "⚠️  Available memory: ${MEM_AVAILABLE}MB (LOW)" >> ${LOG_FILE}
fi

# 檢查 kernel panic 或錯誤
KERNEL_ERRORS=$(dmesg | grep -i "error|fail|panic" | wc -l)
if [ ${KERNEL_ERRORS} -gt 0 ]; then
    echo "⚠️  Kernel errors detected: ${KERNEL_ERRORS}" >> ${LOG_FILE}
    dmesg | grep -i "error|fail|panic" | tail -10 >> ${LOG_FILE}
fi

# 若有服務失敗,發送告警
if [ ${#FAILED_SERVICES[@]} -gt 0 ]; then
    SUBJECT="🚨 Post-Update Alert: Services Failed on $(hostname)"
    BODY="Failed services: ${FAILED_SERVICES[*]}nnSee log: ${LOG_FILE}"
    echo -e "${BODY}" | mail -s "${SUBJECT}" ${ALERT_EMAIL}
    
    # 可選:自動嘗試重啟失敗的服務
    for service in "${FAILED_SERVICES[@]}"; do
        systemctl restart ${service}
        sleep 5
        if systemctl is-active --quiet ${service}; then
            echo "✅ ${service} restarted successfully" >> ${LOG_FILE}
        fi
    done
fi

echo "========== Verification Complete ==========" >> ${LOG_FILE}
# 設定執行權限
sudo chmod +x /usr/local/bin/post-update-verify.sh

# 在 50unattended-upgrades 中啟用
# Unattended-Upgrade::Post-Install-Exec {
#     "/usr/local/bin/post-update-verify.sh";
# };

第四層防護:GRUB 舊核心保留

# 編輯 GRUB 設定
sudo nano /etc/default/grub

# 設定保留舊核心(預設 Ubuntu 22.04 已保留)
# GRUB_DEFAULT=0  # 預設啟動最新核心
# 若新核心有問題,重啟時選擇 "Advanced options" → 舊核心

# 查看已安裝的核心版本
dpkg -l | grep linux-image

# 手動移除舊核心(小心操作)
# sudo apt remove linux-image-5.15.0-91-generic

# 保留至少 2 個核心版本以供回滾

監控與告警機制

日誌檢查要點

# unattended-upgrades 主要日誌
sudo tail -f /var/log/unattended-upgrades/unattended-upgrades.log
sudo tail -f /var/log/unattended-upgrades/unattended-upgrades-dpkg.log

# 查看最近的更新摘要
sudo cat /var/log/unattended-upgrades/unattended-upgrades.log | grep "Packages that will be upgraded"

# apt 操作歷史
sudo tail -50 /var/log/apt/history.log

# dpkg 操作歷史
sudo tail -100 /var/log/dpkg.log

# systemd journal(更詳細)
journalctl -u unattended-upgrades -f
journalctl -u apt-daily -f
journalctl -u apt-daily-upgrade -f

整合監控系統

使用 Prometheus + Node Exporter

# 安裝 node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

# 建立 systemd service
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter 
  --collector.systemd 
  --collector.processes

[Install]
WantedBy=multi-user.target
# 建立使用者
sudo useradd -rs /bin/false node_exporter

# 啟動服務
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

# 驗證
curl http://localhost:9100/metrics | grep apt_

自訂指標腳本

#!/bin/bash
# /usr/local/bin/apt_update_metrics.sh
# 產生 Prometheus 格式的更新指標

TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"
mkdir -p ${TEXTFILE_DIR}

# 計算待更新的套件數量
UPDATES_AVAILABLE=$(apt list --upgradable 2>/dev/null | grep -c upgradable)
SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)

# 最後更新時間(timestamp)
LAST_UPDATE=$(stat -c %Y /var/lib/apt/periodic/update-success-stamp 2>/dev/null || echo 0)

# 輸出 Prometheus 格式
cat > ${TEXTFILE_DIR}/apt_updates.prom << EOF
# HELP apt_updates_available Number of available updates
# TYPE apt_updates_available gauge
apt_updates_available ${UPDATES_AVAILABLE}

# HELP apt_security_updates_available Number of available security updates
# TYPE apt_security_updates_available gauge
apt_security_updates_available ${SECURITY_UPDATES}

# HELP apt_last_update_timestamp Timestamp of last apt update
# TYPE apt_last_update_timestamp gauge
apt_last_update_timestamp ${LAST_UPDATE}
EOF
# 設定 cron 定時執行
sudo crontab -e

# 每小時更新一次指標
0 * * * * /usr/local/bin/apt_update_metrics.sh

郵件告警配置

# 安裝 postfix(輕量 MTA)
sudo apt install postfix mailutils

# 配置使用外部 SMTP(如 Gmail)
sudo nano /etc/postfix/main.cf
relayhost = [smtp.gmail.com]:587
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
# 設定 SMTP 認證
sudo nano /etc/postfix/sasl_passwd
[smtp.gmail.com]:587 your-email@gmail.com:your-app-password
# 建立 hash 資料庫
sudo postmap /etc/postfix/sasl_passwd
sudo chmod 600 /etc/postfix/sasl_passwd*

# 重啟 postfix
sudo systemctl restart postfix

# 測試郵件
echo "Test email from $(hostname)" | mail -s "Test Subject" sysadmin@example.com

不同服務場景的策略建議

場景 1:Web Server(Nginx/Apache)

# 黑名單設定
Unattended-Upgrade::Package-Blacklist {
    "nginx*";
    "apache2*";
    "php*";
};

# 理由:
# - Web server 更新可能改變配置格式
# - PHP 版本升級可能破壞應用程式
# - 需在測試環境驗證後手動更新

# 建議:使用 Blue-Green 部署
# 1. 建立新伺服器並手動更新
# 2. 驗證功能正常
# 3. 切換 Load Balancer 流量
# 4. 監控錯誤率
# 5. 確認無問題後更新其他節點

場景 2:Database Server(MySQL/PostgreSQL)

# 嚴格的黑名單
Unattended-Upgrade::Package-Blacklist {
    "mysql*";
    "mariadb*";
    "postgresql*";
    "percona*";
};

# 理由:
# - 資料庫更新可能需要 schema migration
# - 效能特性可能改變
# - 回滾複雜且風險高

# 建議流程:
# 1. 建立完整備份
# 2. 在 replica 上測試更新
# 3. 驗證複寫正常運作
# 4. 監控效能指標(query time、connections)
# 5. 規劃維護窗口執行主庫更新

場景 3:Container Host(Docker/Kubernetes)

# 選擇性黑名單
Unattended-Upgrade::Package-Blacklist {
    "docker*";
    "containerd*";
    "kubernetes*";
    "kubelet*";
    "kubeadm*";
};

# 理由:
# - Container runtime 更新可能影響正在運行的容器
# - Kubernetes 版本有嚴格的升級路徑
# - 需要驗證 CNI、CSI 等外掛相容性

# 允許自動更新:
# - 安全性修補(kernel、glibc)
# - 監控工具(node_exporter、cAdvisor)

# 建議:
# 1. 使用 Immutable Infrastructure
# 2. 定期重建節點而非 in-place 更新
# 3. 使用 cluster autoscaler 滾動更新

場景 4:Critical Services(最小化更新)

# 極度保守的配置
Unattended-Upgrade::Allowed-Origins {
    // 僅限關鍵安全性更新
    "${distro_id}:${distro_codename}-security";
};

Unattended-Upgrade::Package-Blacklist {
    // 幾乎所有套件都需人工審核
    "*";
};

Unattended-Upgrade::Package-Whitelist {
    // 僅允許最關鍵的安全性修補
    "openssl";
    "libssl*";
    "openssh*";
    "ca-certificates";
};

Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Mail "critical-alerts@example.com";
Unattended-Upgrade::MailReport "always";

# 適用場景:
# - 金融交易系統
# - 醫療設備控制器
# - 工業控制系統(ICS/SCADA)
# - 任何 SLA 要求極高的服務

Ansible 自動化部署

Ansible Playbook:統一配置管理

# playbook: deploy_unattended_upgrades.yml
---
- name: Configure Unattended Upgrades across Ubuntu servers
  hosts: ubuntu_servers
  become: yes
  vars:
    mail_recipient: "sysadmin@example.com"
    reboot_time: "03:00"
    auto_reboot: false
    service_type: "webserver"  # webserver, database, container, critical
    
  tasks:
    - name: Ensure unattended-upgrades is installed
      apt:
        name:
          - unattended-upgrades
          - apt-listchanges
          - mailutils
        state: present
        update_cache: yes

    - name: Deploy 50unattended-upgrades configuration
      template:
        src: templates/50unattended-upgrades.j2
        dest: /etc/apt/apt.conf.d/50unattended-upgrades
        owner: root
        group: root
        mode: '0644'
      notify: restart unattended-upgrades

    - name: Deploy 20auto-upgrades configuration
      copy:
        dest: /etc/apt/apt.conf.d/20auto-upgrades
        content: |
          APT::Periodic::Update-Package-Lists "1";
          APT::Periodic::Download-Upgradeable-Packages "1";
          APT::Periodic::Unattended-Upgrade "1";
          APT::Periodic::AutocleanInterval "7";
          APT::Periodic::Verbose "2";
        owner: root
        group: root
        mode: '0644'

    - name: Configure apt-daily-upgrade.timer schedule
      copy:
        dest: /etc/systemd/system/apt-daily-upgrade.timer.d/override.conf
        content: |
          [Timer]
          OnCalendar=
          OnCalendar={{ reboot_time }}
          RandomizedDelaySec=30min
        owner: root
        group: root
        mode: '0644'
      notify: reload systemd

    - name: Deploy pre-update snapshot script
      template:
        src: templates/pre-update-snapshot.sh.j2
        dest: /usr/local/bin/pre-update-snapshot.sh
        owner: root
        group: root
        mode: '0755'
      when: ansible_facts['lvm'] is defined

    - name: Deploy post-update verification script
      template:
        src: templates/post-update-verify.sh.j2
        dest: /usr/local/bin/post-update-verify.sh
        owner: root
        group: root
        mode: '0755'

    - name: Enable and start unattended-upgrades service
      systemd:
        name: unattended-upgrades
        enabled: yes
        state: started

  handlers:
    - name: restart unattended-upgrades
      systemd:
        name: unattended-upgrades
        state: restarted

    - name: reload systemd
      systemd:
        daemon_reload: yes

Jinja2 模板:動態生成配置

# templates/50unattended-upgrades.j2
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
{% if service_type in ['webserver', 'container'] %}
    // "${distro_id}:${distro_codename}-updates";
{% endif %}
};

Unattended-Upgrade::Package-Blacklist {
    "linux-image-*";
    "linux-headers-*";
{% if service_type == 'webserver' %}
    "nginx*";
    "apache2*";
    "php*";
{% elif service_type == 'database' %}
    "mysql*";
    "postgresql*";
    "mariadb*";
{% elif service_type == 'container' %}
    "docker*";
    "kubernetes*";
{% elif service_type == 'critical' %}
    "*";
{% endif %}
};

{% if service_type == 'critical' %}
Unattended-Upgrade::Package-Whitelist {
    "openssl";
    "libssl*";
    "openssh*";
};
{% endif %}

Unattended-Upgrade::Automatic-Reboot "{{ auto_reboot | lower }}";
{% if auto_reboot %}
Unattended-Upgrade::Automatic-Reboot-Time "{{ reboot_time }}";
{% endif %}

Unattended-Upgrade::Mail "{{ mail_recipient }}";
Unattended-Upgrade::MailReport "on-change";

Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Keep-Debs-After-Install "true";

Dpkg::Options {
    "--force-confdef";
    "--force-confold";
};

執行 Playbook

# 定義 inventory
# inventory/hosts.yml
---
all:
  children:
    ubuntu_servers:
      children:
        webservers:
          hosts:
            web01.example.com:
              service_type: webserver
            web02.example.com:
              service_type: webserver
        databases:
          hosts:
            db01.example.com:
              service_type: database
              auto_reboot: false
        containers:
          hosts:
            k8s-node01.example.com:
              service_type: container

# 執行 playbook
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml

# 僅針對特定群組
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml --limit webservers

# Dry-run 測試
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml --check --diff

最佳實踐總結

資深系統管理員的黃金準則

準則 說明 實作重點
1. 分級而治 不同類型的更新採用不同策略 Security 自動、Kernel 手動、Critical 服務嚴格控制
2. 快照為本 更新前必須有回滾機制 LVM snapshot、套件降級、GRUB 舊核心
3. 測試先行 PRD 環境前必須驗證 Staging 環境、Canary deployment
4. 監控驗證 更新後自動檢查服務狀態 Post-install hook、健康檢查腳本
5. 時間控制 避開業務高峰時段 Systemd timer、維護窗口
6. 告警及時 問題發生時立即通知 Email、Slack、PagerDuty
7. 文件齊全 記錄配置決策與變更歷史 Git 管理配置、變更日誌
8. 自動化統一 使用 IaC 工具統一管理 Ansible、Terraform、Chef

檢查清單(上線前必做)

  • 配置審查:50unattended-upgrades 設定符合服務類型
  • 黑名單驗證:關鍵服務已加入黑名單
  • 重啟策略:Automatic-Reboot 設定正確(建議 false)
  • 時間窗口:systemd timer 排程避開高峰時段
  • 快照機制:LVM snapshot 或備份方案已部署
  • 驗證腳本:post-update-verify.sh 已測試
  • 郵件告警:測試郵件發送成功
  • 監控整合:Prometheus/Zabbix 指標正常採集
  • 回滾演練:團隊熟悉快照回滾流程
  • 文件更新:Runbook 記錄緊急處理步驟

進階議題

Ubuntu Pro ESM 更新

# Ubuntu Pro(原 Ubuntu Advantage)提供延長安全維護
# 適用於需要長期支援的舊版系統

# 註冊 Ubuntu Pro
sudo ua attach YOUR_TOKEN

# 啟用 ESM
sudo ua enable esm-infra
sudo ua enable esm-apps

# 在 50unattended-upgrades 中允許 ESM 更新
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}ESM:${distro_codename}-infra-security";
    "${distro_id}ESMApps:${distro_codename}-apps-security";
};

Kernel Livepatch(無需重啟的核心修補)

# Canonical Livepatch 允許套用 kernel 安全性修補而不重啟
# 適合不能頻繁重啟的服務

# 啟用 Livepatch(需 Ubuntu Pro)
sudo ua enable livepatch

# 或使用免費版本(個人使用)
sudo snap install canonical-livepatch
sudo canonical-livepatch enable YOUR_TOKEN

# 查看狀態
sudo canonical-livepatch status

# 檢查是否需要重啟(即使使用 livepatch)
/usr/lib/update-notifier/update-motd-reboot-required
cat /var/run/reboot-required.pkgs

多伺服器環境的滾動更新策略

# 使用 Ansible 實作 canary deployment
# playbook: rolling_update.yml
---
- name: Rolling update with canary
  hosts: webservers
  serial: 1  # 一次更新一台
  become: yes
  
  pre_tasks:
    - name: Remove server from load balancer
      command: /usr/local/bin/remove-from-lb.sh {{ inventory_hostname }}
      delegate_to: loadbalancer
      
    - name: Wait for connections to drain
      wait_for:
        timeout: 30
  
  tasks:
    - name: Update packages
      apt:
        upgrade: safe
        update_cache: yes
        
    - name: Verify services
      command: /usr/local/bin/post-update-verify.sh
      
  post_tasks:
    - name: Add server back to load balancer
      command: /usr/local/bin/add-to-lb.sh {{ inventory_hostname }}
      delegate_to: loadbalancer
      
    - name: Wait and monitor error rate
      pause:
        minutes: 5
        
    - name: Check error rate
      command: /usr/local/bin/check-error-rate.sh
      register: error_rate
      failed_when: error_rate.stdout | int > 5

結論

Ubuntu Server 的自動更新策略沒有「一體適用」的解決方案。企業生產環境的正確做法是:

  1. 評估風險容忍度:根據服務類型決定自動化程度
  2. 分級管理更新:Security 自動、Kernel 手動、Critical 服務嚴格控制
  3. 建立多層防護:快照 + 套件降級 + 服務驗證 + 監控告警
  4. 持續測試演練:定期驗證回滾流程,確保團隊熟悉
  5. 自動化與標準化:使用 Ansible 等工具統一管理配置

記住:自動更新是為了提升安全性,而非取代專業判斷。資深系統管理員的價值在於理解每個更新的影響範圍,設計適合業務需求的策略,並在出現問題時快速恢復系統。

透過本文提供的完整配置、腳本範例與最佳實踐,您可以建構穩健的自動更新機制,在安全性、穩定性與可控性之間取得最佳平衡。

替代方案:Shell + Cron vs systemd + unattended-upgrades

兩種方案的比較

特性 Shell + Cron systemd + unattended-upgrades
優點 • 完全客製化,彈性高
• 適合特殊需求
• 不依賴額外套件
• 容易理解與除錯
• Ubuntu 官方支援
• 自動處理依賴與衝突
• 完整的錯誤處理
• systemd 整合良好
缺點 • 需自行處理錯誤
• 可能遺漏邊界情況
• 維護成本較高
• 缺乏官方支援
• 彈性較低
• 配置複雜
• 除錯困難
• 需學習配置語法
建議使用時機 • 極度客製化需求
• 整合現有腳本系統
• 簡單的更新流程
• 測試/開發環境
• 標準企業環境(推薦)
• 需要穩定可靠
• 大規模部署
• 生產環境

Shell + Cron 完整實作範例

主要更新腳本

#!/bin/bash
# /usr/local/sbin/auto-update.sh
# Enterprise-grade automated update script for Ubuntu 22.04

# ========== Configuration ==========
LOCK_FILE="/var/run/auto-update.lock"
LOG_FILE="/var/log/auto-update.log"
ERROR_LOG="/var/log/auto-update-error.log"
SNAPSHOT_ENABLED=true
VG_NAME="ubuntu-vg"
LV_NAME="ubuntu-lv"
SNAPSHOT_SIZE="10G"
ALERT_EMAIL="sysadmin@example.com"
MAX_LOG_SIZE=104857600  # 100MB

# Service health check list
CRITICAL_SERVICES=("nginx" "mysql" "postgresql" "docker" "ssh")

# Package blacklist (will not be upgraded)
BLACKLIST_PATTERN="linux-image-|linux-headers-|nginx|mysql-server|postgresql|docker"

# ========== Functions ==========

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "${LOG_FILE}"
}

error_log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" | tee -a "${ERROR_LOG}"
}

send_alert() {
    local subject="$1"
    local body="$2"
    echo -e "${body}" | mail -s "${subject}" "${ALERT_EMAIL}"
}

# Lock file mechanism to prevent concurrent execution
acquire_lock() {
    if [ -f "${LOCK_FILE}" ]; then
        PID=$(cat "${LOCK_FILE}")
        if ps -p ${PID} > /dev/null 2>&1; then
            log "Another update process (PID: ${PID}) is running. Exiting."
            exit 0
        else
            log "Stale lock file found. Removing."
            rm -f "${LOCK_FILE}"
        fi
    fi
    echo $$ > "${LOCK_FILE}"
}

release_lock() {
    rm -f "${LOCK_FILE}"
}

# Rotate log files
rotate_logs() {
    for log in "${LOG_FILE}" "${ERROR_LOG}"; do
        if [ -f "${log}" ] && [ $(stat -c%s "${log}") -gt ${MAX_LOG_SIZE} ]; then
            mv "${log}" "${log}.old"
            gzip "${log}.old"
            touch "${log}"
            log "Log rotated: ${log}"
        fi
    done
}

# Create LVM snapshot before update
create_snapshot() {
    if [ "${SNAPSHOT_ENABLED}" != "true" ]; then
        return 0
    fi
    
    local snapshot_name="auto-update-$(date +%Y%m%d-%H%M%S)"
    
    log "Creating LVM snapshot: ${snapshot_name}"
    
    lvcreate -L ${SNAPSHOT_SIZE} -s -n ${snapshot_name} /dev/${VG_NAME}/${LV_NAME} >> "${LOG_FILE}" 2>&1
    
    if [ $? -eq 0 ]; then
        log "✅ Snapshot created successfully: ${snapshot_name}"
        echo "${snapshot_name}" > /var/tmp/last-update-snapshot.txt
        
        # Clean up old snapshots (keep last 3)
        local snapshots=$(lvs --noheadings -o lv_name ${VG_NAME} 2>/dev/null | grep "auto-update-" | sort -r | tail -n +4)
        for snap in ${snapshots}; do
            lvremove -f /dev/${VG_NAME}/${snap} >> "${LOG_FILE}" 2>&1
            log "Removed old snapshot: ${snap}"
        done
    else
        error_log "Failed to create snapshot"
        send_alert "❌ Auto-Update Failed: Snapshot Creation" "Failed to create LVM snapshot on $(hostname)nnSee: ${LOG_FILE}"
        return 1
    fi
}

# Update package lists
update_package_lists() {
    log "Updating package lists..."
    
    apt-get update >> "${LOG_FILE}" 2>&1
    
    if [ $? -ne 0 ]; then
        error_log "Failed to update package lists"
        return 1
    fi
    
    log "✅ Package lists updated"
    return 0
}

# Get list of upgradeable packages (excluding blacklist)
get_upgradeable_packages() {
    apt list --upgradable 2>/dev/null | grep -v "^Listing" | grep -vE "${BLACKLIST_PATTERN}" | awk -F/ '{print $1}'
}

# Upgrade security packages only
upgrade_security_only() {
    log "Checking for security updates..."
    
    # Install unattended-upgrades if not present
    dpkg -l | grep -q unattended-upgrades || apt-get install -y unattended-upgrades >> "${LOG_FILE}" 2>&1
    
    # Run unattended-upgrades in dry-run mode first
    unattended-upgrade --dry-run -v >> "${LOG_FILE}" 2>&1
    
    # Actually perform upgrades
    unattended-upgrade -v >> "${LOG_FILE}" 2>&1
    
    if [ $? -eq 0 ]; then
        log "✅ Security updates applied successfully"
        return 0
    else
        error_log "Failed to apply security updates"
        return 1
    fi
}

# Upgrade all safe packages (excluding blacklist)
upgrade_safe_packages() {
    local packages=$(get_upgradeable_packages)
    
    if [ -z "${packages}" ]; then
        log "No packages to upgrade"
        return 0
    fi
    
    log "Upgradeable packages (excluding blacklist):"
    echo "${packages}" | tee -a "${LOG_FILE}"
    
    log "Performing safe upgrade..."
    
    DEBIAN_FRONTEND=noninteractive apt-get upgrade -y 
        -o Dpkg::Options::="--force-confdef" 
        -o Dpkg::Options::="--force-confold" 
        >> "${LOG_FILE}" 2>&1
    
    if [ $? -eq 0 ]; then
        log "✅ Packages upgraded successfully"
        return 0
    else
        error_log "Package upgrade failed"
        return 1
    fi
}

# Clean up unused packages
cleanup_packages() {
    log "Cleaning up unused packages..."
    
    apt-get autoremove -y >> "${LOG_FILE}" 2>&1
    apt-get autoclean >> "${LOG_FILE}" 2>&1
    
    log "✅ Cleanup completed"
}

# Verify critical services after update
verify_services() {
    log "Verifying critical services..."
    
    local failed_services=()
    
    for service in "${CRITICAL_SERVICES[@]}"; do
        if systemctl is-active --quiet ${service} 2>/dev/null; then
            log "✅ ${service}: active"
        else
            if systemctl list-unit-files | grep -q "^${service}.service"; then
                error_log "${service}: FAILED or inactive"
                failed_services+=("${service}")
            fi
        fi
    done
    
    # Check system resources
    local disk_usage=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
    if [ ${disk_usage} -gt 90 ]; then
        error_log "Disk usage critical: ${disk_usage}%"
    fi
    
    local mem_available=$(free -m | awk 'NR==2 {print $7}')
    if [ ${mem_available} -lt 500 ]; then
        error_log "Low memory: ${mem_available}MB available"
    fi
    
    # If services failed, attempt restart
    if [ ${#failed_services[@]} -gt 0 ]; then
        log "Attempting to restart failed services..."
        for service in "${failed_services[@]}"; do
            systemctl restart ${service} >> "${LOG_FILE}" 2>&1
            sleep 3
            if systemctl is-active --quiet ${service}; then
                log "✅ ${service} restarted successfully"
            else
                error_log "${service} restart failed"
            fi
        done
        
        # Send alert if still failing
        local still_failed=()
        for service in "${failed_services[@]}"; do
            if ! systemctl is-active --quiet ${service}; then
                still_failed+=("${service}")
            fi
        done
        
        if [ ${#still_failed[@]} -gt 0 ]; then
            send_alert "🚨 Post-Update Alert: Services Failed on $(hostname)" 
                "Failed services: ${still_failed[*]}nnSee logs:n${LOG_FILE}n${ERROR_LOG}"
            return 1
        fi
    fi
    
    log "✅ All services verified"
    return 0
}

# Check if reboot is required
check_reboot_required() {
    if [ -f /var/run/reboot-required ]; then
        log "⚠️  System reboot required"
        log "Packages requiring reboot:"
        cat /var/run/reboot-required.pkgs | tee -a "${LOG_FILE}"
        
        send_alert "⚠️  Reboot Required: $(hostname)" 
            "The following updates require a system reboot:nn$(cat /var/run/reboot-required.pkgs)nnPlease schedule a maintenance window."
        
        return 1
    fi
    return 0
}

# Generate update summary
generate_summary() {
    log "========== Update Summary =========="
    log "Hostname: $(hostname)"
    log "Date: $(date)"
    log "Kernel: $(uname -r)"
    log "Uptime: $(uptime -p)"
    
    # Check for pending updates
    local updates_available=$(apt list --upgradable 2>/dev/null | grep -c "upgradable")
    log "Remaining updates: ${updates_available}"
    
    # Check security updates
    local security_updates=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)
    if [ ${security_updates} -gt 0 ]; then
        log "⚠️  Security updates available: ${security_updates}"
    fi
    
    log "===================================="
}

# ========== Main Execution ==========

main() {
    log "========== Auto-Update Script Started =========="
    
    # Trap to ensure lock release on exit
    trap release_lock EXIT
    
    # Acquire lock
    acquire_lock
    
    # Rotate logs if needed
    rotate_logs
    
    # Create pre-update snapshot
    if ! create_snapshot; then
        error_log "Snapshot creation failed. Aborting update."
        exit 1
    fi
    
    # Update package lists
    if ! update_package_lists; then
        error_log "Failed to update package lists. Aborting."
        exit 1
    fi
    
    # Perform updates (choose one strategy)
    # Strategy 1: Security updates only (conservative)
    if ! upgrade_security_only; then
        error_log "Security update failed"
        exit 1
    fi
    
    # Strategy 2: All safe packages (more aggressive)
    # if ! upgrade_safe_packages; then
    #     error_log "Package upgrade failed"
    #     exit 1
    # fi
    
    # Cleanup
    cleanup_packages
    
    # Verify services
    if ! verify_services; then
        error_log "Service verification failed"
        # Don't exit - services may have been restarted
    fi
    
    # Check reboot requirement
    check_reboot_required
    
    # Generate summary
    generate_summary
    
    log "========== Auto-Update Script Completed =========="
}

# Execute main function
main "$@"

Cron 設定

# 編輯 root crontab
sudo crontab -e

# 選項 1:每天凌晨 2:00 執行(推薦)
0 2 * * * /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

# 選項 2:每週日凌晨 3:00 執行(保守)
0 3 * * 0 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

# 選項 3:每月第一個週日凌晨 2:00(極度保守)
0 2 1-7 * 0 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

# 選項 4:工作日每天凌晨 2:00(避開週末)
0 2 * * 1-5 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

部署腳本

# 設定執行權限
sudo chmod +x /usr/local/sbin/auto-update.sh

# 創建日誌目錄
sudo touch /var/log/auto-update.log
sudo touch /var/log/auto-update-error.log
sudo chmod 640 /var/log/auto-update*.log

# 手動測試執行
sudo /usr/local/sbin/auto-update.sh

# 檢查日誌
sudo tail -f /var/log/auto-update.log

監控 Cron 執行狀態

# 查看 cron 執行歷史
sudo grep "auto-update" /var/log/syslog

# 查看最近的執行結果
sudo tail -100 /var/log/auto-update.log

# 檢查錯誤日誌
sudo cat /var/log/auto-update-error.log

# 使用 journalctl 查看 cron 日誌
sudo journalctl -u cron | grep auto-update

我的建議:根據場景選擇

場景 推薦方案 理由
標準企業生產環境 systemd + unattended-upgrades 穩定、可靠、官方支援,適合大規模部署
高度客製化需求 Shell + Cron 完全控制更新流程,整合現有系統
簡單環境(< 10台) Shell + Cron 簡單易懂,快速部署
關鍵服務 systemd + unattended-upgrades 錯誤處理更完善,降低風險
測試/開發環境 Shell + Cron 彈性高,方便測試與調整
Container Host 兩者皆可 依團隊熟悉度選擇

最終建議

  • 生產環境優先使用 systemd + unattended-upgrades:經過充分測試、穩定可靠
  • Shell + Cron 作為補充:用於特殊需求或無法使用 unattended-upgrades 的情況
  • 兩者結合:用 unattended-upgrades 處理安全性更新,用 Shell 腳本處理特殊邏輯(如快照、健康檢查)

相關文章

Leave a Comment