Ubuntu Server 自動更新完整指南：企業級策略、風險控制與故障恢復

Table of Contents

Ubuntu Server 自動更新完整指南：企業級策略、風險控制與故障恢復

在企業生產環境中，Ubuntu Server 的更新策略是系統管理員面臨的核心挑戰之一。自動更新能確保系統安全性，但不當配置可能導致服務中斷、相容性問題甚至系統無法啟動。本文將從企業 PRD 環境的角度，深入探討如何設計穩健的自動更新策略，包含完整的技術實作、風險控制與故障恢復機制。

核心問題：自動更新應該做到什麼程度？

企業環境的三難困境

需求	衝突點	風險
安全性	需要即時套用安全性更新	未修補的漏洞可能被攻擊
穩定性	更新可能破壞現有功能	服務中斷、相容性問題
可控性	自動化 vs 人工審查	失控的更新或延遲修補

決策框架：更新分級制度

資深系統管理員的共識是：並非所有更新都應該完全自動化。建議採用分級制度：

更新類型	自動化程度	理由	實作策略
Security Updates	✅ 完全自動	關鍵漏洞需即時修補	unattended-upgrades
Important Updates	⚠️ 條件自動	重要但非緊急	測試後自動/人工審核
Standard Updates	⚠️ 手動或排程	風險較高、非緊急	維護窗口執行
Kernel Updates	❌ 手動控制	需要重啟、風險最高	測試驗證後執行
Major Version Upgrade	❌ 嚴格手動	可能破壞整個系統	完整測試與備份

技術實作：unattended-upgrades 完整配置

安裝與基礎設定

# Ubuntu 22.04 預設已安裝，確認版本
apt-cache policy unattended-upgrades

# 若未安裝
sudo apt update
sudo apt install unattended-upgrades apt-listchanges

# 啟用自動更新
sudo dpkg-reconfigure -plow unattended-upgrades

核心配置檔：/etc/apt/apt.conf.d/50unattended-upgrades

這是企業級生產環境的完整配置範例，涵蓋所有關鍵設定：

// Ubuntu 22.04 LTS 企業級自動更新配置
// 目標：自動安裝安全性更新，保持系統穩定性

Unattended-Upgrade::Allowed-Origins {
    // 僅允許官方安全性更新
    "${distro_id}:${distro_codename}-security";
    
    // 可選：包含重要更新（需評估風險）
    // "${distro_id}:${distro_codename}-updates";
    
    // ESM 更新（Ubuntu Pro 訂閱）
    // "${distro_id}ESMApps:${distro_codename}-apps-security";
    // "${distro_id}ESM:${distro_codename}-infra-security";
};

// 黑名單：永不自動更新的套件
Unattended-Upgrade::Package-Blacklist {
    // 核心套件（需人工控制）
    "linux-image-*";
    "linux-headers-*";
    "linux-modules-*";
    
    // 關鍵服務（需測試驗證）
    "nginx";
    "apache2";
    "mysql-server*";
    "postgresql*";
    "docker*";
    "kubernetes*";
    
    // 自訂應用程式
    // "myapp-*";
};

// 白名單：優先更新的套件（覆蓋黑名單）
// Unattended-Upgrade::Package-Whitelist {
//     "openssl";
//     "libssl*";
// };

// 自動移除不再需要的依賴
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-New-Unused-Dependencies "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";

// 更新前後自動執行的命令
// Unattended-Upgrade::Pre-Install-Exec {
//     "/usr/local/bin/pre-update-snapshot.sh";
// };
// Unattended-Upgrade::Post-Install-Exec {
//     "/usr/local/bin/post-update-verify.sh";
// };

// 自動重啟設定（關鍵配置）
Unattended-Upgrade::Automatic-Reboot "false";
// 若必須自動重啟，設定時間窗口
// Unattended-Upgrade::Automatic-Reboot-Time "03:00";
// Unattended-Upgrade::Automatic-Reboot-WithUsers "false";

// 郵件通知設定
Unattended-Upgrade::Mail "sysadmin@example.com";
Unattended-Upgrade::MailReport "on-change";
// 選項：always, only-on-error, on-change

// 下載與安裝限制
Unattended-Upgrade::Download-Limit "70"; // KB/s
// Acquire::http::Dl-Limit "70"; // 全域下載限制

// 保留舊版本套件（便於回滾）
Unattended-Upgrade::Keep-Debs-After-Install "true";

// 詳細日誌記錄
Unattended-Upgrade::Verbose "true";
Unattended-Upgrade::Debug "false";

// 更新失敗時的處理
Unattended-Upgrade::OnlyOnACPower "false";
Unattended-Upgrade::Skip-Updates-On-Metered-Connection "true";

// dpkg 選項：衝突處理策略
Dpkg::Options {
    "--force-confdef";  // 使用預設選項
    "--force-confold";  // 保留舊配置檔
};

// 系統關機時不中斷更新
Unattended-Upgrade::InstallOnShutdown "false";

// SyslogEnable 與 SyslogFacility 已棄用，改用 systemd journal

自動更新排程：/etc/apt/apt.conf.d/20auto-upgrades

// 控制 apt-daily.timer 與 apt-daily-upgrade.timer 的行為

// 每天更新套件清單（1 = 每天執行）
APT::Periodic::Update-Package-Lists "1";

// 每天下載可升級的套件（不安裝）
APT::Periodic::Download-Upgradeable-Packages "1";

// 執行 unattended-upgrade（1 = 每天執行）
APT::Periodic::Unattended-Upgrade "1";

// 自動清理（單位：天）
APT::Periodic::AutocleanInterval "7";

// 詳細日誌
APT::Periodic::Verbose "2";

systemd Timer 時間控制

預設的 systemd timer 可能在業務高峰時段執行，需要調整：

# 查看當前排程
systemctl status apt-daily.timer
systemctl status apt-daily-upgrade.timer

# 查看下次執行時間
systemctl list-timers apt-daily*

# 客製化執行時間（建立 override）
sudo systemctl edit apt-daily.timer

在編輯器中加入：

[Timer]
# 清除預設時間
OnCalendar=
# 設定為每天凌晨 2:00 執行（避開業務高峰）
OnCalendar=02:00
# 隨機延遲 0-30 分鐘（避免多台伺服器同時更新）
RandomizedDelaySec=30min

同樣調整 upgrade timer：

sudo systemctl edit apt-daily-upgrade.timer

[Timer]
OnCalendar=
OnCalendar=03:00
RandomizedDelaySec=30min

# 重新載入並驗證
sudo systemctl daemon-reload
systemctl list-timers apt-daily*

升級失敗的多層次恢復機制

這是企業環境最關鍵的部分。升級失敗可能導致：

系統無法啟動（kernel panic、initramfs 問題）
服務無法運作（相依性衝突、配置不相容）
效能降低（新版本 bug、資源消耗增加）

第一層防護：更新前自動快照

LVM 快照（推薦）

若系統使用 LVM，可在更新前自動建立快照：

#!/bin/bash
# /usr/local/bin/pre-update-snapshot.sh

SNAPSHOT_NAME="pre-update-$(date +%Y%m%d-%H%M%S)"
VG_NAME="ubuntu-vg"
LV_NAME="ubuntu-lv"
SNAPSHOT_SIZE="10G"

# 建立 LVM 快照
lvcreate -L ${SNAPSHOT_SIZE} -s -n ${SNAPSHOT_NAME} /dev/${VG_NAME}/${LV_NAME}

if [ $? -eq 0 ]; then
    echo "✅ LVM snapshot created: ${SNAPSHOT_NAME}" | logger -t pre-update
    # 記錄快照資訊
    echo "${SNAPSHOT_NAME}" > /var/log/last-update-snapshot.txt
else
    echo "❌ Failed to create LVM snapshot" | logger -t pre-update
    exit 1
fi

# 保留最近 3 個快照，刪除舊的
SNAPSHOTS=$(lvs --noheadings -o lv_name ${VG_NAME} | grep "pre-update-" | sort -r | tail -n +4)
for snap in ${SNAPSHOTS}; do
    lvremove -f /dev/${VG_NAME}/${snap}
    echo "🗑️  Removed old snapshot: ${snap}" | logger -t pre-update
done

# 設定執行權限
sudo chmod +x /usr/local/bin/pre-update-snapshot.sh

# 在 50unattended-upgrades 中啟用
# Unattended-Upgrade::Pre-Install-Exec {
#     "/usr/local/bin/pre-update-snapshot.sh";
# };

快照回滾流程

若更新後系統異常，從救援模式回滾：

# 1. 重啟進入 GRUB 選單，選擇 "Advanced options" > "Recovery mode"

# 2. 選擇 "root - Drop to root shell prompt"

# 3. 重新掛載根目錄為讀寫
mount -o remount,rw /

# 4. 查看可用快照
lvs ubuntu-vg

# 5. 合併快照（回滾）
lvconvert --merge /dev/ubuntu-vg/pre-update-20250120-020000

# 6. 重新啟動
reboot

第二層防護：套件層級回滾

保留舊版本套件

# 在 50unattended-upgrades 中已設定
Unattended-Upgrade::Keep-Debs-After-Install "true";

# 舊版本套件儲存位置
/var/cache/apt/archives/

# 查看已安裝套件的歷史版本
ls -lh /var/cache/apt/archives/ | grep nginx

降級特定套件

# 查看套件安裝歷史
grep "install|upgrade" /var/log/dpkg.log | tail -50
grep "install|upgrade" /var/log/apt/history.log | tail -50

# 查看可用的舊版本
apt-cache policy nginx

# 降級至特定版本
sudo apt install nginx=1.18.0-6ubuntu14.3

# 鎖定套件版本，防止再次升級
sudo apt-mark hold nginx

# 查看鎖定的套件
apt-mark showhold

# 解除鎖定
sudo apt-mark unhold nginx

第三層防護：服務健康檢查

更新後自動驗證關鍵服務：

#!/bin/bash
# /usr/local/bin/post-update-verify.sh

LOG_FILE="/var/log/post-update-verify.log"
ALERT_EMAIL="sysadmin@example.com"

echo "========== Post-Update Verification: $(date) ==========" >> ${LOG_FILE}

# 檢查關鍵服務狀態
SERVICES=("nginx" "mysql" "postgresql" "docker" "ssh")
FAILED_SERVICES=()

for service in "${SERVICES[@]}"; do
    if systemctl is-active --quiet ${service}; then
        echo "✅ ${service}: active" >> ${LOG_FILE}
    else
        echo "❌ ${service}: FAILED" >> ${LOG_FILE}
        FAILED_SERVICES+=("${service}")
    fi
done

# 檢查磁碟空間
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ ${DISK_USAGE} -gt 90 ]; then
    echo "⚠️  Disk usage: ${DISK_USAGE}% (WARNING)" >> ${LOG_FILE}
fi

# 檢查記憶體
MEM_AVAILABLE=$(free -m | awk 'NR==2 {print $7}')
if [ ${MEM_AVAILABLE} -lt 500 ]; then
    echo "⚠️  Available memory: ${MEM_AVAILABLE}MB (LOW)" >> ${LOG_FILE}
fi

# 檢查 kernel panic 或錯誤
KERNEL_ERRORS=$(dmesg | grep -i "error|fail|panic" | wc -l)
if [ ${KERNEL_ERRORS} -gt 0 ]; then
    echo "⚠️  Kernel errors detected: ${KERNEL_ERRORS}" >> ${LOG_FILE}
    dmesg | grep -i "error|fail|panic" | tail -10 >> ${LOG_FILE}
fi

# 若有服務失敗，發送告警
if [ ${#FAILED_SERVICES[@]} -gt 0 ]; then
    SUBJECT="🚨 Post-Update Alert: Services Failed on $(hostname)"
    BODY="Failed services: ${FAILED_SERVICES[*]}nnSee log: ${LOG_FILE}"
    echo -e "${BODY}" | mail -s "${SUBJECT}" ${ALERT_EMAIL}
    
    # 可選：自動嘗試重啟失敗的服務
    for service in "${FAILED_SERVICES[@]}"; do
        systemctl restart ${service}
        sleep 5
        if systemctl is-active --quiet ${service}; then
            echo "✅ ${service} restarted successfully" >> ${LOG_FILE}
        fi
    done
fi

echo "========== Verification Complete ==========" >> ${LOG_FILE}

# 設定執行權限
sudo chmod +x /usr/local/bin/post-update-verify.sh

# 在 50unattended-upgrades 中啟用
# Unattended-Upgrade::Post-Install-Exec {
#     "/usr/local/bin/post-update-verify.sh";
# };

第四層防護：GRUB 舊核心保留

# 編輯 GRUB 設定
sudo nano /etc/default/grub

# 設定保留舊核心（預設 Ubuntu 22.04 已保留）
# GRUB_DEFAULT=0  # 預設啟動最新核心
# 若新核心有問題，重啟時選擇 "Advanced options" → 舊核心

# 查看已安裝的核心版本
dpkg -l | grep linux-image

# 手動移除舊核心（小心操作）
# sudo apt remove linux-image-5.15.0-91-generic

# 保留至少 2 個核心版本以供回滾

監控與告警機制

日誌檢查要點

# unattended-upgrades 主要日誌
sudo tail -f /var/log/unattended-upgrades/unattended-upgrades.log
sudo tail -f /var/log/unattended-upgrades/unattended-upgrades-dpkg.log

# 查看最近的更新摘要
sudo cat /var/log/unattended-upgrades/unattended-upgrades.log | grep "Packages that will be upgraded"

# apt 操作歷史
sudo tail -50 /var/log/apt/history.log

# dpkg 操作歷史
sudo tail -100 /var/log/dpkg.log

# systemd journal（更詳細）
journalctl -u unattended-upgrades -f
journalctl -u apt-daily -f
journalctl -u apt-daily-upgrade -f

整合監控系統

使用 Prometheus + Node Exporter

# 安裝 node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

# 建立 systemd service
sudo nano /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter 
  --collector.systemd 
  --collector.processes

[Install]
WantedBy=multi-user.target

# 建立使用者
sudo useradd -rs /bin/false node_exporter

# 啟動服務
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

# 驗證
curl http://localhost:9100/metrics | grep apt_

自訂指標腳本

#!/bin/bash
# /usr/local/bin/apt_update_metrics.sh
# 產生 Prometheus 格式的更新指標

TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"
mkdir -p ${TEXTFILE_DIR}

# 計算待更新的套件數量
UPDATES_AVAILABLE=$(apt list --upgradable 2>/dev/null | grep -c upgradable)
SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)

# 最後更新時間（timestamp）
LAST_UPDATE=$(stat -c %Y /var/lib/apt/periodic/update-success-stamp 2>/dev/null || echo 0)

# 輸出 Prometheus 格式
cat > ${TEXTFILE_DIR}/apt_updates.prom << EOF
# HELP apt_updates_available Number of available updates
# TYPE apt_updates_available gauge
apt_updates_available ${UPDATES_AVAILABLE}

# HELP apt_security_updates_available Number of available security updates
# TYPE apt_security_updates_available gauge
apt_security_updates_available ${SECURITY_UPDATES}

# HELP apt_last_update_timestamp Timestamp of last apt update
# TYPE apt_last_update_timestamp gauge
apt_last_update_timestamp ${LAST_UPDATE}
EOF

# 設定 cron 定時執行
sudo crontab -e

# 每小時更新一次指標
0 * * * * /usr/local/bin/apt_update_metrics.sh

郵件告警配置

# 安裝 postfix（輕量 MTA）
sudo apt install postfix mailutils

# 配置使用外部 SMTP（如 Gmail）
sudo nano /etc/postfix/main.cf

relayhost = [smtp.gmail.com]:587
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous

# 設定 SMTP 認證
sudo nano /etc/postfix/sasl_passwd

[smtp.gmail.com]:587 your-email@gmail.com:your-app-password

# 建立 hash 資料庫
sudo postmap /etc/postfix/sasl_passwd
sudo chmod 600 /etc/postfix/sasl_passwd*

# 重啟 postfix
sudo systemctl restart postfix

# 測試郵件
echo "Test email from $(hostname)" | mail -s "Test Subject" sysadmin@example.com

不同服務場景的策略建議

場景 1：Web Server（Nginx/Apache）

# 黑名單設定
Unattended-Upgrade::Package-Blacklist {
    "nginx*";
    "apache2*";
    "php*";
};

# 理由：
# - Web server 更新可能改變配置格式
# - PHP 版本升級可能破壞應用程式
# - 需在測試環境驗證後手動更新

# 建議：使用 Blue-Green 部署
# 1. 建立新伺服器並手動更新
# 2. 驗證功能正常
# 3. 切換 Load Balancer 流量
# 4. 監控錯誤率
# 5. 確認無問題後更新其他節點

場景 2：Database Server（MySQL/PostgreSQL）

# 嚴格的黑名單
Unattended-Upgrade::Package-Blacklist {
    "mysql*";
    "mariadb*";
    "postgresql*";
    "percona*";
};

# 理由：
# - 資料庫更新可能需要 schema migration
# - 效能特性可能改變
# - 回滾複雜且風險高

# 建議流程：
# 1. 建立完整備份
# 2. 在 replica 上測試更新
# 3. 驗證複寫正常運作
# 4. 監控效能指標（query time、connections）
# 5. 規劃維護窗口執行主庫更新

場景 3：Container Host（Docker/Kubernetes）

# 選擇性黑名單
Unattended-Upgrade::Package-Blacklist {
    "docker*";
    "containerd*";
    "kubernetes*";
    "kubelet*";
    "kubeadm*";
};

# 理由：
# - Container runtime 更新可能影響正在運行的容器
# - Kubernetes 版本有嚴格的升級路徑
# - 需要驗證 CNI、CSI 等外掛相容性

# 允許自動更新：
# - 安全性修補（kernel、glibc）
# - 監控工具（node_exporter、cAdvisor）

# 建議：
# 1. 使用 Immutable Infrastructure
# 2. 定期重建節點而非 in-place 更新
# 3. 使用 cluster autoscaler 滾動更新

場景 4：Critical Services（最小化更新）

# 極度保守的配置
Unattended-Upgrade::Allowed-Origins {
    // 僅限關鍵安全性更新
    "${distro_id}:${distro_codename}-security";
};

Unattended-Upgrade::Package-Blacklist {
    // 幾乎所有套件都需人工審核
    "*";
};

Unattended-Upgrade::Package-Whitelist {
    // 僅允許最關鍵的安全性修補
    "openssl";
    "libssl*";
    "openssh*";
    "ca-certificates";
};

Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Mail "critical-alerts@example.com";
Unattended-Upgrade::MailReport "always";

# 適用場景：
# - 金融交易系統
# - 醫療設備控制器
# - 工業控制系統（ICS/SCADA）
# - 任何 SLA 要求極高的服務

Ansible 自動化部署

Ansible Playbook：統一配置管理

# playbook: deploy_unattended_upgrades.yml
---
- name: Configure Unattended Upgrades across Ubuntu servers
  hosts: ubuntu_servers
  become: yes
  vars:
    mail_recipient: "sysadmin@example.com"
    reboot_time: "03:00"
    auto_reboot: false
    service_type: "webserver"  # webserver, database, container, critical
    
  tasks:
    - name: Ensure unattended-upgrades is installed
      apt:
        name:
          - unattended-upgrades
          - apt-listchanges
          - mailutils
        state: present
        update_cache: yes

    - name: Deploy 50unattended-upgrades configuration
      template:
        src: templates/50unattended-upgrades.j2
        dest: /etc/apt/apt.conf.d/50unattended-upgrades
        owner: root
        group: root
        mode: '0644'
      notify: restart unattended-upgrades

    - name: Deploy 20auto-upgrades configuration
      copy:
        dest: /etc/apt/apt.conf.d/20auto-upgrades
        content: |
          APT::Periodic::Update-Package-Lists "1";
          APT::Periodic::Download-Upgradeable-Packages "1";
          APT::Periodic::Unattended-Upgrade "1";
          APT::Periodic::AutocleanInterval "7";
          APT::Periodic::Verbose "2";
        owner: root
        group: root
        mode: '0644'

    - name: Configure apt-daily-upgrade.timer schedule
      copy:
        dest: /etc/systemd/system/apt-daily-upgrade.timer.d/override.conf
        content: |
          [Timer]
          OnCalendar=
          OnCalendar={{ reboot_time }}
          RandomizedDelaySec=30min
        owner: root
        group: root
        mode: '0644'
      notify: reload systemd

    - name: Deploy pre-update snapshot script
      template:
        src: templates/pre-update-snapshot.sh.j2
        dest: /usr/local/bin/pre-update-snapshot.sh
        owner: root
        group: root
        mode: '0755'
      when: ansible_facts['lvm'] is defined

    - name: Deploy post-update verification script
      template:
        src: templates/post-update-verify.sh.j2
        dest: /usr/local/bin/post-update-verify.sh
        owner: root
        group: root
        mode: '0755'

    - name: Enable and start unattended-upgrades service
      systemd:
        name: unattended-upgrades
        enabled: yes
        state: started

  handlers:
    - name: restart unattended-upgrades
      systemd:
        name: unattended-upgrades
        state: restarted

    - name: reload systemd
      systemd:
        daemon_reload: yes

Jinja2 模板：動態生成配置

# templates/50unattended-upgrades.j2
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
{% if service_type in ['webserver', 'container'] %}
    // "${distro_id}:${distro_codename}-updates";
{% endif %}
};

Unattended-Upgrade::Package-Blacklist {
    "linux-image-*";
    "linux-headers-*";
{% if service_type == 'webserver' %}
    "nginx*";
    "apache2*";
    "php*";
{% elif service_type == 'database' %}
    "mysql*";
    "postgresql*";
    "mariadb*";
{% elif service_type == 'container' %}
    "docker*";
    "kubernetes*";
{% elif service_type == 'critical' %}
    "*";
{% endif %}
};

{% if service_type == 'critical' %}
Unattended-Upgrade::Package-Whitelist {
    "openssl";
    "libssl*";
    "openssh*";
};
{% endif %}

Unattended-Upgrade::Automatic-Reboot "{{ auto_reboot | lower }}";
{% if auto_reboot %}
Unattended-Upgrade::Automatic-Reboot-Time "{{ reboot_time }}";
{% endif %}

Unattended-Upgrade::Mail "{{ mail_recipient }}";
Unattended-Upgrade::MailReport "on-change";

Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Keep-Debs-After-Install "true";

Dpkg::Options {
    "--force-confdef";
    "--force-confold";
};

執行 Playbook

# 定義 inventory
# inventory/hosts.yml
---
all:
  children:
    ubuntu_servers:
      children:
        webservers:
          hosts:
            web01.example.com:
              service_type: webserver
            web02.example.com:
              service_type: webserver
        databases:
          hosts:
            db01.example.com:
              service_type: database
              auto_reboot: false
        containers:
          hosts:
            k8s-node01.example.com:
              service_type: container

# 執行 playbook
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml

# 僅針對特定群組
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml --limit webservers

# Dry-run 測試
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml --check --diff

最佳實踐總結

資深系統管理員的黃金準則

準則	說明	實作重點
1. 分級而治	不同類型的更新採用不同策略	Security 自動、Kernel 手動、Critical 服務嚴格控制
2. 快照為本	更新前必須有回滾機制	LVM snapshot、套件降級、GRUB 舊核心
3. 測試先行	PRD 環境前必須驗證	Staging 環境、Canary deployment
4. 監控驗證	更新後自動檢查服務狀態	Post-install hook、健康檢查腳本
5. 時間控制	避開業務高峰時段	Systemd timer、維護窗口
6. 告警及時	問題發生時立即通知	Email、Slack、PagerDuty
7. 文件齊全	記錄配置決策與變更歷史	Git 管理配置、變更日誌
8. 自動化統一	使用 IaC 工具統一管理	Ansible、Terraform、Chef

檢查清單（上線前必做）

☐ 配置審查：50unattended-upgrades 設定符合服務類型
☐ 黑名單驗證：關鍵服務已加入黑名單
☐ 重啟策略：Automatic-Reboot 設定正確（建議 false）
☐ 時間窗口：systemd timer 排程避開高峰時段
☐ 快照機制：LVM snapshot 或備份方案已部署
☐ 驗證腳本：post-update-verify.sh 已測試
☐ 郵件告警：測試郵件發送成功
☐ 監控整合：Prometheus/Zabbix 指標正常採集
☐ 回滾演練：團隊熟悉快照回滾流程
☐ 文件更新：Runbook 記錄緊急處理步驟

進階議題

Ubuntu Pro ESM 更新

# Ubuntu Pro（原 Ubuntu Advantage）提供延長安全維護
# 適用於需要長期支援的舊版系統

# 註冊 Ubuntu Pro
sudo ua attach YOUR_TOKEN

# 啟用 ESM
sudo ua enable esm-infra
sudo ua enable esm-apps

# 在 50unattended-upgrades 中允許 ESM 更新
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}ESM:${distro_codename}-infra-security";
    "${distro_id}ESMApps:${distro_codename}-apps-security";
};

Kernel Livepatch（無需重啟的核心修補）

# Canonical Livepatch 允許套用 kernel 安全性修補而不重啟
# 適合不能頻繁重啟的服務

# 啟用 Livepatch（需 Ubuntu Pro）
sudo ua enable livepatch

# 或使用免費版本（個人使用）
sudo snap install canonical-livepatch
sudo canonical-livepatch enable YOUR_TOKEN

# 查看狀態
sudo canonical-livepatch status

# 檢查是否需要重啟（即使使用 livepatch）
/usr/lib/update-notifier/update-motd-reboot-required
cat /var/run/reboot-required.pkgs

多伺服器環境的滾動更新策略

# 使用 Ansible 實作 canary deployment
# playbook: rolling_update.yml
---
- name: Rolling update with canary
  hosts: webservers
  serial: 1  # 一次更新一台
  become: yes
  
  pre_tasks:
    - name: Remove server from load balancer
      command: /usr/local/bin/remove-from-lb.sh {{ inventory_hostname }}
      delegate_to: loadbalancer
      
    - name: Wait for connections to drain
      wait_for:
        timeout: 30
  
  tasks:
    - name: Update packages
      apt:
        upgrade: safe
        update_cache: yes
        
    - name: Verify services
      command: /usr/local/bin/post-update-verify.sh
      
  post_tasks:
    - name: Add server back to load balancer
      command: /usr/local/bin/add-to-lb.sh {{ inventory_hostname }}
      delegate_to: loadbalancer
      
    - name: Wait and monitor error rate
      pause:
        minutes: 5
        
    - name: Check error rate
      command: /usr/local/bin/check-error-rate.sh
      register: error_rate
      failed_when: error_rate.stdout | int > 5

結論

Ubuntu Server 的自動更新策略沒有「一體適用」的解決方案。企業生產環境的正確做法是：

評估風險容忍度：根據服務類型決定自動化程度
分級管理更新：Security 自動、Kernel 手動、Critical 服務嚴格控制
建立多層防護：快照 + 套件降級 + 服務驗證 + 監控告警
持續測試演練：定期驗證回滾流程，確保團隊熟悉
自動化與標準化：使用 Ansible 等工具統一管理配置

記住：自動更新是為了提升安全性，而非取代專業判斷。資深系統管理員的價值在於理解每個更新的影響範圍，設計適合業務需求的策略，並在出現問題時快速恢復系統。

透過本文提供的完整配置、腳本範例與最佳實踐，您可以建構穩健的自動更新機制，在安全性、穩定性與可控性之間取得最佳平衡。

替代方案：Shell + Cron vs systemd + unattended-upgrades

兩種方案的比較

特性	Shell + Cron	systemd + unattended-upgrades
優點	• 完全客製化，彈性高 • 適合特殊需求 • 不依賴額外套件 • 容易理解與除錯	• Ubuntu 官方支援 • 自動處理依賴與衝突 • 完整的錯誤處理 • systemd 整合良好
缺點	• 需自行處理錯誤 • 可能遺漏邊界情況 • 維護成本較高 • 缺乏官方支援	• 彈性較低 • 配置複雜 • 除錯困難 • 需學習配置語法
建議使用時機	• 極度客製化需求 • 整合現有腳本系統 • 簡單的更新流程 • 測試/開發環境	• 標準企業環境（推薦） • 需要穩定可靠 • 大規模部署 • 生產環境

Shell + Cron 完整實作範例

主要更新腳本

#!/bin/bash
# /usr/local/sbin/auto-update.sh
# Enterprise-grade automated update script for Ubuntu 22.04

# ========== Configuration ==========
LOCK_FILE="/var/run/auto-update.lock"
LOG_FILE="/var/log/auto-update.log"
ERROR_LOG="/var/log/auto-update-error.log"
SNAPSHOT_ENABLED=true
VG_NAME="ubuntu-vg"
LV_NAME="ubuntu-lv"
SNAPSHOT_SIZE="10G"
ALERT_EMAIL="sysadmin@example.com"
MAX_LOG_SIZE=104857600  # 100MB

# Service health check list
CRITICAL_SERVICES=("nginx" "mysql" "postgresql" "docker" "ssh")

# Package blacklist (will not be upgraded)
BLACKLIST_PATTERN="linux-image-|linux-headers-|nginx|mysql-server|postgresql|docker"

# ========== Functions ==========

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "${LOG_FILE}"
}

error_log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" | tee -a "${ERROR_LOG}"
}

send_alert() {
    local subject="$1"
    local body="$2"
    echo -e "${body}" | mail -s "${subject}" "${ALERT_EMAIL}"
}

# Lock file mechanism to prevent concurrent execution
acquire_lock() {
    if [ -f "${LOCK_FILE}" ]; then
        PID=$(cat "${LOCK_FILE}")
        if ps -p ${PID} > /dev/null 2>&1; then
            log "Another update process (PID: ${PID}) is running. Exiting."
            exit 0
        else
            log "Stale lock file found. Removing."
            rm -f "${LOCK_FILE}"
        fi
    fi
    echo $$ > "${LOCK_FILE}"
}

release_lock() {
    rm -f "${LOCK_FILE}"
}

# Rotate log files
rotate_logs() {
    for log in "${LOG_FILE}" "${ERROR_LOG}"; do
        if [ -f "${log}" ] && [ $(stat -c%s "${log}") -gt ${MAX_LOG_SIZE} ]; then
            mv "${log}" "${log}.old"
            gzip "${log}.old"
            touch "${log}"
            log "Log rotated: ${log}"
        fi
    done
}

# Create LVM snapshot before update
create_snapshot() {
    if [ "${SNAPSHOT_ENABLED}" != "true" ]; then
        return 0
    fi
    
    local snapshot_name="auto-update-$(date +%Y%m%d-%H%M%S)"
    
    log "Creating LVM snapshot: ${snapshot_name}"
    
    lvcreate -L ${SNAPSHOT_SIZE} -s -n ${snapshot_name} /dev/${VG_NAME}/${LV_NAME} >> "${LOG_FILE}" 2>&1
    
    if [ $? -eq 0 ]; then
        log "✅ Snapshot created successfully: ${snapshot_name}"
        echo "${snapshot_name}" > /var/tmp/last-update-snapshot.txt
        
        # Clean up old snapshots (keep last 3)
        local snapshots=$(lvs --noheadings -o lv_name ${VG_NAME} 2>/dev/null | grep "auto-update-" | sort -r | tail -n +4)
        for snap in ${snapshots}; do
            lvremove -f /dev/${VG_NAME}/${snap} >> "${LOG_FILE}" 2>&1
            log "Removed old snapshot: ${snap}"
        done
    else
        error_log "Failed to create snapshot"
        send_alert "❌ Auto-Update Failed: Snapshot Creation" "Failed to create LVM snapshot on $(hostname)nnSee: ${LOG_FILE}"
        return 1
    fi
}

# Update package lists
update_package_lists() {
    log "Updating package lists..."
    
    apt-get update >> "${LOG_FILE}" 2>&1
    
    if [ $? -ne 0 ]; then
        error_log "Failed to update package lists"
        return 1
    fi
    
    log "✅ Package lists updated"
    return 0
}

# Get list of upgradeable packages (excluding blacklist)
get_upgradeable_packages() {
    apt list --upgradable 2>/dev/null | grep -v "^Listing" | grep -vE "${BLACKLIST_PATTERN}" | awk -F/ '{print $1}'
}

# Upgrade security packages only
upgrade_security_only() {
    log "Checking for security updates..."
    
    # Install unattended-upgrades if not present
    dpkg -l | grep -q unattended-upgrades || apt-get install -y unattended-upgrades >> "${LOG_FILE}" 2>&1
    
    # Run unattended-upgrades in dry-run mode first
    unattended-upgrade --dry-run -v >> "${LOG_FILE}" 2>&1
    
    # Actually perform upgrades
    unattended-upgrade -v >> "${LOG_FILE}" 2>&1
    
    if [ $? -eq 0 ]; then
        log "✅ Security updates applied successfully"
        return 0
    else
        error_log "Failed to apply security updates"
        return 1
    fi
}

# Upgrade all safe packages (excluding blacklist)
upgrade_safe_packages() {
    local packages=$(get_upgradeable_packages)
    
    if [ -z "${packages}" ]; then
        log "No packages to upgrade"
        return 0
    fi
    
    log "Upgradeable packages (excluding blacklist):"
    echo "${packages}" | tee -a "${LOG_FILE}"
    
    log "Performing safe upgrade..."
    
    DEBIAN_FRONTEND=noninteractive apt-get upgrade -y 
        -o Dpkg::Options::="--force-confdef" 
        -o Dpkg::Options::="--force-confold" 
        >> "${LOG_FILE}" 2>&1
    
    if [ $? -eq 0 ]; then
        log "✅ Packages upgraded successfully"
        return 0
    else
        error_log "Package upgrade failed"
        return 1
    fi
}

# Clean up unused packages
cleanup_packages() {
    log "Cleaning up unused packages..."
    
    apt-get autoremove -y >> "${LOG_FILE}" 2>&1
    apt-get autoclean >> "${LOG_FILE}" 2>&1
    
    log "✅ Cleanup completed"
}

# Verify critical services after update
verify_services() {
    log "Verifying critical services..."
    
    local failed_services=()
    
    for service in "${CRITICAL_SERVICES[@]}"; do
        if systemctl is-active --quiet ${service} 2>/dev/null; then
            log "✅ ${service}: active"
        else
            if systemctl list-unit-files | grep -q "^${service}.service"; then
                error_log "${service}: FAILED or inactive"
                failed_services+=("${service}")
            fi
        fi
    done
    
    # Check system resources
    local disk_usage=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
    if [ ${disk_usage} -gt 90 ]; then
        error_log "Disk usage critical: ${disk_usage}%"
    fi
    
    local mem_available=$(free -m | awk 'NR==2 {print $7}')
    if [ ${mem_available} -lt 500 ]; then
        error_log "Low memory: ${mem_available}MB available"
    fi
    
    # If services failed, attempt restart
    if [ ${#failed_services[@]} -gt 0 ]; then
        log "Attempting to restart failed services..."
        for service in "${failed_services[@]}"; do
            systemctl restart ${service} >> "${LOG_FILE}" 2>&1
            sleep 3
            if systemctl is-active --quiet ${service}; then
                log "✅ ${service} restarted successfully"
            else
                error_log "${service} restart failed"
            fi
        done
        
        # Send alert if still failing
        local still_failed=()
        for service in "${failed_services[@]}"; do
            if ! systemctl is-active --quiet ${service}; then
                still_failed+=("${service}")
            fi
        done
        
        if [ ${#still_failed[@]} -gt 0 ]; then
            send_alert "🚨 Post-Update Alert: Services Failed on $(hostname)" 
                "Failed services: ${still_failed[*]}nnSee logs:n${LOG_FILE}n${ERROR_LOG}"
            return 1
        fi
    fi
    
    log "✅ All services verified"
    return 0
}

# Check if reboot is required
check_reboot_required() {
    if [ -f /var/run/reboot-required ]; then
        log "⚠️  System reboot required"
        log "Packages requiring reboot:"
        cat /var/run/reboot-required.pkgs | tee -a "${LOG_FILE}"
        
        send_alert "⚠️  Reboot Required: $(hostname)" 
            "The following updates require a system reboot:nn$(cat /var/run/reboot-required.pkgs)nnPlease schedule a maintenance window."
        
        return 1
    fi
    return 0
}

# Generate update summary
generate_summary() {
    log "========== Update Summary =========="
    log "Hostname: $(hostname)"
    log "Date: $(date)"
    log "Kernel: $(uname -r)"
    log "Uptime: $(uptime -p)"
    
    # Check for pending updates
    local updates_available=$(apt list --upgradable 2>/dev/null | grep -c "upgradable")
    log "Remaining updates: ${updates_available}"
    
    # Check security updates
    local security_updates=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)
    if [ ${security_updates} -gt 0 ]; then
        log "⚠️  Security updates available: ${security_updates}"
    fi
    
    log "===================================="
}

# ========== Main Execution ==========

main() {
    log "========== Auto-Update Script Started =========="
    
    # Trap to ensure lock release on exit
    trap release_lock EXIT
    
    # Acquire lock
    acquire_lock
    
    # Rotate logs if needed
    rotate_logs
    
    # Create pre-update snapshot
    if ! create_snapshot; then
        error_log "Snapshot creation failed. Aborting update."
        exit 1
    fi
    
    # Update package lists
    if ! update_package_lists; then
        error_log "Failed to update package lists. Aborting."
        exit 1
    fi
    
    # Perform updates (choose one strategy)
    # Strategy 1: Security updates only (conservative)
    if ! upgrade_security_only; then
        error_log "Security update failed"
        exit 1
    fi
    
    # Strategy 2: All safe packages (more aggressive)
    # if ! upgrade_safe_packages; then
    #     error_log "Package upgrade failed"
    #     exit 1
    # fi
    
    # Cleanup
    cleanup_packages
    
    # Verify services
    if ! verify_services; then
        error_log "Service verification failed"
        # Don't exit - services may have been restarted
    fi
    
    # Check reboot requirement
    check_reboot_required
    
    # Generate summary
    generate_summary
    
    log "========== Auto-Update Script Completed =========="
}

# Execute main function
main "$@"

Cron 設定

# 編輯 root crontab
sudo crontab -e

# 選項 1：每天凌晨 2:00 執行（推薦）
0 2 * * * /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

# 選項 2：每週日凌晨 3:00 執行（保守）
0 3 * * 0 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

# 選項 3：每月第一個週日凌晨 2:00（極度保守）
0 2 1-7 * 0 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

# 選項 4：工作日每天凌晨 2:00（避開週末）
0 2 * * 1-5 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1

部署腳本

# 設定執行權限
sudo chmod +x /usr/local/sbin/auto-update.sh

# 創建日誌目錄
sudo touch /var/log/auto-update.log
sudo touch /var/log/auto-update-error.log
sudo chmod 640 /var/log/auto-update*.log

# 手動測試執行
sudo /usr/local/sbin/auto-update.sh

# 檢查日誌
sudo tail -f /var/log/auto-update.log

監控 Cron 執行狀態

# 查看 cron 執行歷史
sudo grep "auto-update" /var/log/syslog

# 查看最近的執行結果
sudo tail -100 /var/log/auto-update.log

# 檢查錯誤日誌
sudo cat /var/log/auto-update-error.log

# 使用 journalctl 查看 cron 日誌
sudo journalctl -u cron | grep auto-update

我的建議：根據場景選擇

場景	推薦方案	理由
標準企業生產環境	systemd + unattended-upgrades	穩定、可靠、官方支援，適合大規模部署
高度客製化需求	Shell + Cron	完全控制更新流程，整合現有系統
簡單環境（< 10台）	Shell + Cron	簡單易懂，快速部署
關鍵服務	systemd + unattended-upgrades	錯誤處理更完善，降低風險
測試/開發環境	Shell + Cron	彈性高，方便測試與調整
Container Host	兩者皆可	依團隊熟悉度選擇

最終建議：

✅ 生產環境優先使用 systemd + unattended-upgrades：經過充分測試、穩定可靠
✅ Shell + Cron 作為補充：用於特殊需求或無法使用 unattended-upgrades 的情況
✅ 兩者結合：用 unattended-upgrades 處理安全性更新，用 Shell 腳本處理特殊邏輯（如快照、健康檢查）

Ubuntu Server 自動更新完整指南：企業級策略、風險控制與故障恢復

核心問題：自動更新應該做到什麼程度？

企業環境的三難困境

決策框架：更新分級制度

技術實作：unattended-upgrades 完整配置

安裝與基礎設定

核心配置檔：/etc/apt/apt.conf.d/50unattended-upgrades

自動更新排程：/etc/apt/apt.conf.d/20auto-upgrades

systemd Timer 時間控制

升級失敗的多層次恢復機制

第一層防護：更新前自動快照

LVM 快照（推薦）

快照回滾流程

第二層防護：套件層級回滾

保留舊版本套件

降級特定套件

第三層防護：服務健康檢查

第四層防護：GRUB 舊核心保留

監控與告警機制

日誌檢查要點

整合監控系統

使用 Prometheus + Node Exporter

自訂指標腳本

郵件告警配置

不同服務場景的策略建議

場景 1：Web Server（Nginx/Apache）

場景 2：Database Server（MySQL/PostgreSQL）

場景 3：Container Host（Docker/Kubernetes）

場景 4：Critical Services（最小化更新）

Ansible 自動化部署

Ansible Playbook：統一配置管理

Jinja2 模板：動態生成配置

執行 Playbook

最佳實踐總結

資深系統管理員的黃金準則

檢查清單（上線前必做）

進階議題

Ubuntu Pro ESM 更新

Kernel Livepatch（無需重啟的核心修補）

多伺服器環境的滾動更新策略

結論

替代方案：Shell + Cron vs systemd + unattended-upgrades

兩種方案的比較

Shell + Cron 完整實作範例

主要更新腳本

Cron 設定

部署腳本

監控 Cron 執行狀態

我的建議：根據場景選擇

相關文章

Related posts:

Leave a Comment Cancel reply