Ubuntu Server 自動更新完整指南:企業級策略、風險控制與故障恢復
在企業生產環境中,Ubuntu Server 的更新策略是系統管理員面臨的核心挑戰之一。自動更新能確保系統安全性,但不當配置可能導致服務中斷、相容性問題甚至系統無法啟動。本文將從企業 PRD 環境的角度,深入探討如何設計穩健的自動更新策略,包含完整的技術實作、風險控制與故障恢復機制。
核心問題:自動更新應該做到什麼程度?
企業環境的三難困境
| 需求 | 衝突點 | 風險 |
|---|---|---|
| 安全性 | 需要即時套用安全性更新 | 未修補的漏洞可能被攻擊 |
| 穩定性 | 更新可能破壞現有功能 | 服務中斷、相容性問題 |
| 可控性 | 自動化 vs 人工審查 | 失控的更新或延遲修補 |
決策框架:更新分級制度
資深系統管理員的共識是:並非所有更新都應該完全自動化。建議採用分級制度:
| 更新類型 | 自動化程度 | 理由 | 實作策略 |
|---|---|---|---|
| Security Updates | ✅ 完全自動 | 關鍵漏洞需即時修補 | unattended-upgrades |
| Important Updates | ⚠️ 條件自動 | 重要但非緊急 | 測試後自動/人工審核 |
| Standard Updates | ⚠️ 手動或排程 | 風險較高、非緊急 | 維護窗口執行 |
| Kernel Updates | ❌ 手動控制 | 需要重啟、風險最高 | 測試驗證後執行 |
| Major Version Upgrade | ❌ 嚴格手動 | 可能破壞整個系統 | 完整測試與備份 |
技術實作:unattended-upgrades 完整配置
安裝與基礎設定
# Ubuntu 22.04 預設已安裝,確認版本
apt-cache policy unattended-upgrades
# 若未安裝
sudo apt update
sudo apt install unattended-upgrades apt-listchanges
# 啟用自動更新
sudo dpkg-reconfigure -plow unattended-upgrades
核心配置檔:/etc/apt/apt.conf.d/50unattended-upgrades
這是企業級生產環境的完整配置範例,涵蓋所有關鍵設定:
// Ubuntu 22.04 LTS 企業級自動更新配置
// 目標:自動安裝安全性更新,保持系統穩定性
Unattended-Upgrade::Allowed-Origins {
// 僅允許官方安全性更新
"${distro_id}:${distro_codename}-security";
// 可選:包含重要更新(需評估風險)
// "${distro_id}:${distro_codename}-updates";
// ESM 更新(Ubuntu Pro 訂閱)
// "${distro_id}ESMApps:${distro_codename}-apps-security";
// "${distro_id}ESM:${distro_codename}-infra-security";
};
// 黑名單:永不自動更新的套件
Unattended-Upgrade::Package-Blacklist {
// 核心套件(需人工控制)
"linux-image-*";
"linux-headers-*";
"linux-modules-*";
// 關鍵服務(需測試驗證)
"nginx";
"apache2";
"mysql-server*";
"postgresql*";
"docker*";
"kubernetes*";
// 自訂應用程式
// "myapp-*";
};
// 白名單:優先更新的套件(覆蓋黑名單)
// Unattended-Upgrade::Package-Whitelist {
// "openssl";
// "libssl*";
// };
// 自動移除不再需要的依賴
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-New-Unused-Dependencies "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
// 更新前後自動執行的命令
// Unattended-Upgrade::Pre-Install-Exec {
// "/usr/local/bin/pre-update-snapshot.sh";
// };
// Unattended-Upgrade::Post-Install-Exec {
// "/usr/local/bin/post-update-verify.sh";
// };
// 自動重啟設定(關鍵配置)
Unattended-Upgrade::Automatic-Reboot "false";
// 若必須自動重啟,設定時間窗口
// Unattended-Upgrade::Automatic-Reboot-Time "03:00";
// Unattended-Upgrade::Automatic-Reboot-WithUsers "false";
// 郵件通知設定
Unattended-Upgrade::Mail "sysadmin@example.com";
Unattended-Upgrade::MailReport "on-change";
// 選項:always, only-on-error, on-change
// 下載與安裝限制
Unattended-Upgrade::Download-Limit "70"; // KB/s
// Acquire::http::Dl-Limit "70"; // 全域下載限制
// 保留舊版本套件(便於回滾)
Unattended-Upgrade::Keep-Debs-After-Install "true";
// 詳細日誌記錄
Unattended-Upgrade::Verbose "true";
Unattended-Upgrade::Debug "false";
// 更新失敗時的處理
Unattended-Upgrade::OnlyOnACPower "false";
Unattended-Upgrade::Skip-Updates-On-Metered-Connection "true";
// dpkg 選項:衝突處理策略
Dpkg::Options {
"--force-confdef"; // 使用預設選項
"--force-confold"; // 保留舊配置檔
};
// 系統關機時不中斷更新
Unattended-Upgrade::InstallOnShutdown "false";
// SyslogEnable 與 SyslogFacility 已棄用,改用 systemd journal
自動更新排程:/etc/apt/apt.conf.d/20auto-upgrades
// 控制 apt-daily.timer 與 apt-daily-upgrade.timer 的行為
// 每天更新套件清單(1 = 每天執行)
APT::Periodic::Update-Package-Lists "1";
// 每天下載可升級的套件(不安裝)
APT::Periodic::Download-Upgradeable-Packages "1";
// 執行 unattended-upgrade(1 = 每天執行)
APT::Periodic::Unattended-Upgrade "1";
// 自動清理(單位:天)
APT::Periodic::AutocleanInterval "7";
// 詳細日誌
APT::Periodic::Verbose "2";
systemd Timer 時間控制
預設的 systemd timer 可能在業務高峰時段執行,需要調整:
# 查看當前排程
systemctl status apt-daily.timer
systemctl status apt-daily-upgrade.timer
# 查看下次執行時間
systemctl list-timers apt-daily*
# 客製化執行時間(建立 override)
sudo systemctl edit apt-daily.timer
在編輯器中加入:
[Timer]
# 清除預設時間
OnCalendar=
# 設定為每天凌晨 2:00 執行(避開業務高峰)
OnCalendar=02:00
# 隨機延遲 0-30 分鐘(避免多台伺服器同時更新)
RandomizedDelaySec=30min
同樣調整 upgrade timer:
sudo systemctl edit apt-daily-upgrade.timer
[Timer]
OnCalendar=
OnCalendar=03:00
RandomizedDelaySec=30min
# 重新載入並驗證
sudo systemctl daemon-reload
systemctl list-timers apt-daily*
升級失敗的多層次恢復機制
這是企業環境最關鍵的部分。升級失敗可能導致:
- 系統無法啟動(kernel panic、initramfs 問題)
- 服務無法運作(相依性衝突、配置不相容)
- 效能降低(新版本 bug、資源消耗增加)
第一層防護:更新前自動快照
LVM 快照(推薦)
若系統使用 LVM,可在更新前自動建立快照:
#!/bin/bash
# /usr/local/bin/pre-update-snapshot.sh
SNAPSHOT_NAME="pre-update-$(date +%Y%m%d-%H%M%S)"
VG_NAME="ubuntu-vg"
LV_NAME="ubuntu-lv"
SNAPSHOT_SIZE="10G"
# 建立 LVM 快照
lvcreate -L ${SNAPSHOT_SIZE} -s -n ${SNAPSHOT_NAME} /dev/${VG_NAME}/${LV_NAME}
if [ $? -eq 0 ]; then
echo "✅ LVM snapshot created: ${SNAPSHOT_NAME}" | logger -t pre-update
# 記錄快照資訊
echo "${SNAPSHOT_NAME}" > /var/log/last-update-snapshot.txt
else
echo "❌ Failed to create LVM snapshot" | logger -t pre-update
exit 1
fi
# 保留最近 3 個快照,刪除舊的
SNAPSHOTS=$(lvs --noheadings -o lv_name ${VG_NAME} | grep "pre-update-" | sort -r | tail -n +4)
for snap in ${SNAPSHOTS}; do
lvremove -f /dev/${VG_NAME}/${snap}
echo "🗑️ Removed old snapshot: ${snap}" | logger -t pre-update
done
# 設定執行權限
sudo chmod +x /usr/local/bin/pre-update-snapshot.sh
# 在 50unattended-upgrades 中啟用
# Unattended-Upgrade::Pre-Install-Exec {
# "/usr/local/bin/pre-update-snapshot.sh";
# };
快照回滾流程
若更新後系統異常,從救援模式回滾:
# 1. 重啟進入 GRUB 選單,選擇 "Advanced options" > "Recovery mode"
# 2. 選擇 "root - Drop to root shell prompt"
# 3. 重新掛載根目錄為讀寫
mount -o remount,rw /
# 4. 查看可用快照
lvs ubuntu-vg
# 5. 合併快照(回滾)
lvconvert --merge /dev/ubuntu-vg/pre-update-20250120-020000
# 6. 重新啟動
reboot
第二層防護:套件層級回滾
保留舊版本套件
# 在 50unattended-upgrades 中已設定
Unattended-Upgrade::Keep-Debs-After-Install "true";
# 舊版本套件儲存位置
/var/cache/apt/archives/
# 查看已安裝套件的歷史版本
ls -lh /var/cache/apt/archives/ | grep nginx
降級特定套件
# 查看套件安裝歷史
grep "install|upgrade" /var/log/dpkg.log | tail -50
grep "install|upgrade" /var/log/apt/history.log | tail -50
# 查看可用的舊版本
apt-cache policy nginx
# 降級至特定版本
sudo apt install nginx=1.18.0-6ubuntu14.3
# 鎖定套件版本,防止再次升級
sudo apt-mark hold nginx
# 查看鎖定的套件
apt-mark showhold
# 解除鎖定
sudo apt-mark unhold nginx
第三層防護:服務健康檢查
更新後自動驗證關鍵服務:
#!/bin/bash
# /usr/local/bin/post-update-verify.sh
LOG_FILE="/var/log/post-update-verify.log"
ALERT_EMAIL="sysadmin@example.com"
echo "========== Post-Update Verification: $(date) ==========" >> ${LOG_FILE}
# 檢查關鍵服務狀態
SERVICES=("nginx" "mysql" "postgresql" "docker" "ssh")
FAILED_SERVICES=()
for service in "${SERVICES[@]}"; do
if systemctl is-active --quiet ${service}; then
echo "✅ ${service}: active" >> ${LOG_FILE}
else
echo "❌ ${service}: FAILED" >> ${LOG_FILE}
FAILED_SERVICES+=("${service}")
fi
done
# 檢查磁碟空間
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ ${DISK_USAGE} -gt 90 ]; then
echo "⚠️ Disk usage: ${DISK_USAGE}% (WARNING)" >> ${LOG_FILE}
fi
# 檢查記憶體
MEM_AVAILABLE=$(free -m | awk 'NR==2 {print $7}')
if [ ${MEM_AVAILABLE} -lt 500 ]; then
echo "⚠️ Available memory: ${MEM_AVAILABLE}MB (LOW)" >> ${LOG_FILE}
fi
# 檢查 kernel panic 或錯誤
KERNEL_ERRORS=$(dmesg | grep -i "error|fail|panic" | wc -l)
if [ ${KERNEL_ERRORS} -gt 0 ]; then
echo "⚠️ Kernel errors detected: ${KERNEL_ERRORS}" >> ${LOG_FILE}
dmesg | grep -i "error|fail|panic" | tail -10 >> ${LOG_FILE}
fi
# 若有服務失敗,發送告警
if [ ${#FAILED_SERVICES[@]} -gt 0 ]; then
SUBJECT="🚨 Post-Update Alert: Services Failed on $(hostname)"
BODY="Failed services: ${FAILED_SERVICES[*]}nnSee log: ${LOG_FILE}"
echo -e "${BODY}" | mail -s "${SUBJECT}" ${ALERT_EMAIL}
# 可選:自動嘗試重啟失敗的服務
for service in "${FAILED_SERVICES[@]}"; do
systemctl restart ${service}
sleep 5
if systemctl is-active --quiet ${service}; then
echo "✅ ${service} restarted successfully" >> ${LOG_FILE}
fi
done
fi
echo "========== Verification Complete ==========" >> ${LOG_FILE}
# 設定執行權限
sudo chmod +x /usr/local/bin/post-update-verify.sh
# 在 50unattended-upgrades 中啟用
# Unattended-Upgrade::Post-Install-Exec {
# "/usr/local/bin/post-update-verify.sh";
# };
第四層防護:GRUB 舊核心保留
# 編輯 GRUB 設定
sudo nano /etc/default/grub
# 設定保留舊核心(預設 Ubuntu 22.04 已保留)
# GRUB_DEFAULT=0 # 預設啟動最新核心
# 若新核心有問題,重啟時選擇 "Advanced options" → 舊核心
# 查看已安裝的核心版本
dpkg -l | grep linux-image
# 手動移除舊核心(小心操作)
# sudo apt remove linux-image-5.15.0-91-generic
# 保留至少 2 個核心版本以供回滾
監控與告警機制
日誌檢查要點
# unattended-upgrades 主要日誌
sudo tail -f /var/log/unattended-upgrades/unattended-upgrades.log
sudo tail -f /var/log/unattended-upgrades/unattended-upgrades-dpkg.log
# 查看最近的更新摘要
sudo cat /var/log/unattended-upgrades/unattended-upgrades.log | grep "Packages that will be upgraded"
# apt 操作歷史
sudo tail -50 /var/log/apt/history.log
# dpkg 操作歷史
sudo tail -100 /var/log/dpkg.log
# systemd journal(更詳細)
journalctl -u unattended-upgrades -f
journalctl -u apt-daily -f
journalctl -u apt-daily-upgrade -f
整合監控系統
使用 Prometheus + Node Exporter
# 安裝 node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
# 建立 systemd service
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
--collector.systemd
--collector.processes
[Install]
WantedBy=multi-user.target
# 建立使用者
sudo useradd -rs /bin/false node_exporter
# 啟動服務
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
# 驗證
curl http://localhost:9100/metrics | grep apt_
自訂指標腳本
#!/bin/bash
# /usr/local/bin/apt_update_metrics.sh
# 產生 Prometheus 格式的更新指標
TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"
mkdir -p ${TEXTFILE_DIR}
# 計算待更新的套件數量
UPDATES_AVAILABLE=$(apt list --upgradable 2>/dev/null | grep -c upgradable)
SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)
# 最後更新時間(timestamp)
LAST_UPDATE=$(stat -c %Y /var/lib/apt/periodic/update-success-stamp 2>/dev/null || echo 0)
# 輸出 Prometheus 格式
cat > ${TEXTFILE_DIR}/apt_updates.prom << EOF
# HELP apt_updates_available Number of available updates
# TYPE apt_updates_available gauge
apt_updates_available ${UPDATES_AVAILABLE}
# HELP apt_security_updates_available Number of available security updates
# TYPE apt_security_updates_available gauge
apt_security_updates_available ${SECURITY_UPDATES}
# HELP apt_last_update_timestamp Timestamp of last apt update
# TYPE apt_last_update_timestamp gauge
apt_last_update_timestamp ${LAST_UPDATE}
EOF
# 設定 cron 定時執行
sudo crontab -e
# 每小時更新一次指標
0 * * * * /usr/local/bin/apt_update_metrics.sh
郵件告警配置
# 安裝 postfix(輕量 MTA)
sudo apt install postfix mailutils
# 配置使用外部 SMTP(如 Gmail)
sudo nano /etc/postfix/main.cf
relayhost = [smtp.gmail.com]:587
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
# 設定 SMTP 認證
sudo nano /etc/postfix/sasl_passwd
[smtp.gmail.com]:587 your-email@gmail.com:your-app-password
# 建立 hash 資料庫
sudo postmap /etc/postfix/sasl_passwd
sudo chmod 600 /etc/postfix/sasl_passwd*
# 重啟 postfix
sudo systemctl restart postfix
# 測試郵件
echo "Test email from $(hostname)" | mail -s "Test Subject" sysadmin@example.com
不同服務場景的策略建議
場景 1:Web Server(Nginx/Apache)
# 黑名單設定
Unattended-Upgrade::Package-Blacklist {
"nginx*";
"apache2*";
"php*";
};
# 理由:
# - Web server 更新可能改變配置格式
# - PHP 版本升級可能破壞應用程式
# - 需在測試環境驗證後手動更新
# 建議:使用 Blue-Green 部署
# 1. 建立新伺服器並手動更新
# 2. 驗證功能正常
# 3. 切換 Load Balancer 流量
# 4. 監控錯誤率
# 5. 確認無問題後更新其他節點
場景 2:Database Server(MySQL/PostgreSQL)
# 嚴格的黑名單
Unattended-Upgrade::Package-Blacklist {
"mysql*";
"mariadb*";
"postgresql*";
"percona*";
};
# 理由:
# - 資料庫更新可能需要 schema migration
# - 效能特性可能改變
# - 回滾複雜且風險高
# 建議流程:
# 1. 建立完整備份
# 2. 在 replica 上測試更新
# 3. 驗證複寫正常運作
# 4. 監控效能指標(query time、connections)
# 5. 規劃維護窗口執行主庫更新
場景 3:Container Host(Docker/Kubernetes)
# 選擇性黑名單
Unattended-Upgrade::Package-Blacklist {
"docker*";
"containerd*";
"kubernetes*";
"kubelet*";
"kubeadm*";
};
# 理由:
# - Container runtime 更新可能影響正在運行的容器
# - Kubernetes 版本有嚴格的升級路徑
# - 需要驗證 CNI、CSI 等外掛相容性
# 允許自動更新:
# - 安全性修補(kernel、glibc)
# - 監控工具(node_exporter、cAdvisor)
# 建議:
# 1. 使用 Immutable Infrastructure
# 2. 定期重建節點而非 in-place 更新
# 3. 使用 cluster autoscaler 滾動更新
場景 4:Critical Services(最小化更新)
# 極度保守的配置
Unattended-Upgrade::Allowed-Origins {
// 僅限關鍵安全性更新
"${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::Package-Blacklist {
// 幾乎所有套件都需人工審核
"*";
};
Unattended-Upgrade::Package-Whitelist {
// 僅允許最關鍵的安全性修補
"openssl";
"libssl*";
"openssh*";
"ca-certificates";
};
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Mail "critical-alerts@example.com";
Unattended-Upgrade::MailReport "always";
# 適用場景:
# - 金融交易系統
# - 醫療設備控制器
# - 工業控制系統(ICS/SCADA)
# - 任何 SLA 要求極高的服務
Ansible 自動化部署
Ansible Playbook:統一配置管理
# playbook: deploy_unattended_upgrades.yml
---
- name: Configure Unattended Upgrades across Ubuntu servers
hosts: ubuntu_servers
become: yes
vars:
mail_recipient: "sysadmin@example.com"
reboot_time: "03:00"
auto_reboot: false
service_type: "webserver" # webserver, database, container, critical
tasks:
- name: Ensure unattended-upgrades is installed
apt:
name:
- unattended-upgrades
- apt-listchanges
- mailutils
state: present
update_cache: yes
- name: Deploy 50unattended-upgrades configuration
template:
src: templates/50unattended-upgrades.j2
dest: /etc/apt/apt.conf.d/50unattended-upgrades
owner: root
group: root
mode: '0644'
notify: restart unattended-upgrades
- name: Deploy 20auto-upgrades configuration
copy:
dest: /etc/apt/apt.conf.d/20auto-upgrades
content: |
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Verbose "2";
owner: root
group: root
mode: '0644'
- name: Configure apt-daily-upgrade.timer schedule
copy:
dest: /etc/systemd/system/apt-daily-upgrade.timer.d/override.conf
content: |
[Timer]
OnCalendar=
OnCalendar={{ reboot_time }}
RandomizedDelaySec=30min
owner: root
group: root
mode: '0644'
notify: reload systemd
- name: Deploy pre-update snapshot script
template:
src: templates/pre-update-snapshot.sh.j2
dest: /usr/local/bin/pre-update-snapshot.sh
owner: root
group: root
mode: '0755'
when: ansible_facts['lvm'] is defined
- name: Deploy post-update verification script
template:
src: templates/post-update-verify.sh.j2
dest: /usr/local/bin/post-update-verify.sh
owner: root
group: root
mode: '0755'
- name: Enable and start unattended-upgrades service
systemd:
name: unattended-upgrades
enabled: yes
state: started
handlers:
- name: restart unattended-upgrades
systemd:
name: unattended-upgrades
state: restarted
- name: reload systemd
systemd:
daemon_reload: yes
Jinja2 模板:動態生成配置
# templates/50unattended-upgrades.j2
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}-security";
{% if service_type in ['webserver', 'container'] %}
// "${distro_id}:${distro_codename}-updates";
{% endif %}
};
Unattended-Upgrade::Package-Blacklist {
"linux-image-*";
"linux-headers-*";
{% if service_type == 'webserver' %}
"nginx*";
"apache2*";
"php*";
{% elif service_type == 'database' %}
"mysql*";
"postgresql*";
"mariadb*";
{% elif service_type == 'container' %}
"docker*";
"kubernetes*";
{% elif service_type == 'critical' %}
"*";
{% endif %}
};
{% if service_type == 'critical' %}
Unattended-Upgrade::Package-Whitelist {
"openssl";
"libssl*";
"openssh*";
};
{% endif %}
Unattended-Upgrade::Automatic-Reboot "{{ auto_reboot | lower }}";
{% if auto_reboot %}
Unattended-Upgrade::Automatic-Reboot-Time "{{ reboot_time }}";
{% endif %}
Unattended-Upgrade::Mail "{{ mail_recipient }}";
Unattended-Upgrade::MailReport "on-change";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Keep-Debs-After-Install "true";
Dpkg::Options {
"--force-confdef";
"--force-confold";
};
執行 Playbook
# 定義 inventory
# inventory/hosts.yml
---
all:
children:
ubuntu_servers:
children:
webservers:
hosts:
web01.example.com:
service_type: webserver
web02.example.com:
service_type: webserver
databases:
hosts:
db01.example.com:
service_type: database
auto_reboot: false
containers:
hosts:
k8s-node01.example.com:
service_type: container
# 執行 playbook
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml
# 僅針對特定群組
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml --limit webservers
# Dry-run 測試
ansible-playbook -i inventory/hosts.yml deploy_unattended_upgrades.yml --check --diff
最佳實踐總結
資深系統管理員的黃金準則
| 準則 | 說明 | 實作重點 |
|---|---|---|
| 1. 分級而治 | 不同類型的更新採用不同策略 | Security 自動、Kernel 手動、Critical 服務嚴格控制 |
| 2. 快照為本 | 更新前必須有回滾機制 | LVM snapshot、套件降級、GRUB 舊核心 |
| 3. 測試先行 | PRD 環境前必須驗證 | Staging 環境、Canary deployment |
| 4. 監控驗證 | 更新後自動檢查服務狀態 | Post-install hook、健康檢查腳本 |
| 5. 時間控制 | 避開業務高峰時段 | Systemd timer、維護窗口 |
| 6. 告警及時 | 問題發生時立即通知 | Email、Slack、PagerDuty |
| 7. 文件齊全 | 記錄配置決策與變更歷史 | Git 管理配置、變更日誌 |
| 8. 自動化統一 | 使用 IaC 工具統一管理 | Ansible、Terraform、Chef |
檢查清單(上線前必做)
- ☐ 配置審查:50unattended-upgrades 設定符合服務類型
- ☐ 黑名單驗證:關鍵服務已加入黑名單
- ☐ 重啟策略:Automatic-Reboot 設定正確(建議 false)
- ☐ 時間窗口:systemd timer 排程避開高峰時段
- ☐ 快照機制:LVM snapshot 或備份方案已部署
- ☐ 驗證腳本:post-update-verify.sh 已測試
- ☐ 郵件告警:測試郵件發送成功
- ☐ 監控整合:Prometheus/Zabbix 指標正常採集
- ☐ 回滾演練:團隊熟悉快照回滾流程
- ☐ 文件更新:Runbook 記錄緊急處理步驟
進階議題
Ubuntu Pro ESM 更新
# Ubuntu Pro(原 Ubuntu Advantage)提供延長安全維護
# 適用於需要長期支援的舊版系統
# 註冊 Ubuntu Pro
sudo ua attach YOUR_TOKEN
# 啟用 ESM
sudo ua enable esm-infra
sudo ua enable esm-apps
# 在 50unattended-upgrades 中允許 ESM 更新
Unattended-Upgrade::Allowed-Origins {
"${distro_id}ESM:${distro_codename}-infra-security";
"${distro_id}ESMApps:${distro_codename}-apps-security";
};
Kernel Livepatch(無需重啟的核心修補)
# Canonical Livepatch 允許套用 kernel 安全性修補而不重啟
# 適合不能頻繁重啟的服務
# 啟用 Livepatch(需 Ubuntu Pro)
sudo ua enable livepatch
# 或使用免費版本(個人使用)
sudo snap install canonical-livepatch
sudo canonical-livepatch enable YOUR_TOKEN
# 查看狀態
sudo canonical-livepatch status
# 檢查是否需要重啟(即使使用 livepatch)
/usr/lib/update-notifier/update-motd-reboot-required
cat /var/run/reboot-required.pkgs
多伺服器環境的滾動更新策略
# 使用 Ansible 實作 canary deployment
# playbook: rolling_update.yml
---
- name: Rolling update with canary
hosts: webservers
serial: 1 # 一次更新一台
become: yes
pre_tasks:
- name: Remove server from load balancer
command: /usr/local/bin/remove-from-lb.sh {{ inventory_hostname }}
delegate_to: loadbalancer
- name: Wait for connections to drain
wait_for:
timeout: 30
tasks:
- name: Update packages
apt:
upgrade: safe
update_cache: yes
- name: Verify services
command: /usr/local/bin/post-update-verify.sh
post_tasks:
- name: Add server back to load balancer
command: /usr/local/bin/add-to-lb.sh {{ inventory_hostname }}
delegate_to: loadbalancer
- name: Wait and monitor error rate
pause:
minutes: 5
- name: Check error rate
command: /usr/local/bin/check-error-rate.sh
register: error_rate
failed_when: error_rate.stdout | int > 5
結論
Ubuntu Server 的自動更新策略沒有「一體適用」的解決方案。企業生產環境的正確做法是:
- 評估風險容忍度:根據服務類型決定自動化程度
- 分級管理更新:Security 自動、Kernel 手動、Critical 服務嚴格控制
- 建立多層防護:快照 + 套件降級 + 服務驗證 + 監控告警
- 持續測試演練:定期驗證回滾流程,確保團隊熟悉
- 自動化與標準化:使用 Ansible 等工具統一管理配置
記住:自動更新是為了提升安全性,而非取代專業判斷。資深系統管理員的價值在於理解每個更新的影響範圍,設計適合業務需求的策略,並在出現問題時快速恢復系統。
透過本文提供的完整配置、腳本範例與最佳實踐,您可以建構穩健的自動更新機制,在安全性、穩定性與可控性之間取得最佳平衡。
替代方案:Shell + Cron vs systemd + unattended-upgrades
兩種方案的比較
| 特性 | Shell + Cron | systemd + unattended-upgrades |
|---|---|---|
| 優點 |
• 完全客製化,彈性高 • 適合特殊需求 • 不依賴額外套件 • 容易理解與除錯 |
• Ubuntu 官方支援 • 自動處理依賴與衝突 • 完整的錯誤處理 • systemd 整合良好 |
| 缺點 |
• 需自行處理錯誤 • 可能遺漏邊界情況 • 維護成本較高 • 缺乏官方支援 |
• 彈性較低 • 配置複雜 • 除錯困難 • 需學習配置語法 |
| 建議使用時機 |
• 極度客製化需求 • 整合現有腳本系統 • 簡單的更新流程 • 測試/開發環境 |
• 標準企業環境(推薦) • 需要穩定可靠 • 大規模部署 • 生產環境 |
Shell + Cron 完整實作範例
主要更新腳本
#!/bin/bash
# /usr/local/sbin/auto-update.sh
# Enterprise-grade automated update script for Ubuntu 22.04
# ========== Configuration ==========
LOCK_FILE="/var/run/auto-update.lock"
LOG_FILE="/var/log/auto-update.log"
ERROR_LOG="/var/log/auto-update-error.log"
SNAPSHOT_ENABLED=true
VG_NAME="ubuntu-vg"
LV_NAME="ubuntu-lv"
SNAPSHOT_SIZE="10G"
ALERT_EMAIL="sysadmin@example.com"
MAX_LOG_SIZE=104857600 # 100MB
# Service health check list
CRITICAL_SERVICES=("nginx" "mysql" "postgresql" "docker" "ssh")
# Package blacklist (will not be upgraded)
BLACKLIST_PATTERN="linux-image-|linux-headers-|nginx|mysql-server|postgresql|docker"
# ========== Functions ==========
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "${LOG_FILE}"
}
error_log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" | tee -a "${ERROR_LOG}"
}
send_alert() {
local subject="$1"
local body="$2"
echo -e "${body}" | mail -s "${subject}" "${ALERT_EMAIL}"
}
# Lock file mechanism to prevent concurrent execution
acquire_lock() {
if [ -f "${LOCK_FILE}" ]; then
PID=$(cat "${LOCK_FILE}")
if ps -p ${PID} > /dev/null 2>&1; then
log "Another update process (PID: ${PID}) is running. Exiting."
exit 0
else
log "Stale lock file found. Removing."
rm -f "${LOCK_FILE}"
fi
fi
echo $$ > "${LOCK_FILE}"
}
release_lock() {
rm -f "${LOCK_FILE}"
}
# Rotate log files
rotate_logs() {
for log in "${LOG_FILE}" "${ERROR_LOG}"; do
if [ -f "${log}" ] && [ $(stat -c%s "${log}") -gt ${MAX_LOG_SIZE} ]; then
mv "${log}" "${log}.old"
gzip "${log}.old"
touch "${log}"
log "Log rotated: ${log}"
fi
done
}
# Create LVM snapshot before update
create_snapshot() {
if [ "${SNAPSHOT_ENABLED}" != "true" ]; then
return 0
fi
local snapshot_name="auto-update-$(date +%Y%m%d-%H%M%S)"
log "Creating LVM snapshot: ${snapshot_name}"
lvcreate -L ${SNAPSHOT_SIZE} -s -n ${snapshot_name} /dev/${VG_NAME}/${LV_NAME} >> "${LOG_FILE}" 2>&1
if [ $? -eq 0 ]; then
log "✅ Snapshot created successfully: ${snapshot_name}"
echo "${snapshot_name}" > /var/tmp/last-update-snapshot.txt
# Clean up old snapshots (keep last 3)
local snapshots=$(lvs --noheadings -o lv_name ${VG_NAME} 2>/dev/null | grep "auto-update-" | sort -r | tail -n +4)
for snap in ${snapshots}; do
lvremove -f /dev/${VG_NAME}/${snap} >> "${LOG_FILE}" 2>&1
log "Removed old snapshot: ${snap}"
done
else
error_log "Failed to create snapshot"
send_alert "❌ Auto-Update Failed: Snapshot Creation" "Failed to create LVM snapshot on $(hostname)nnSee: ${LOG_FILE}"
return 1
fi
}
# Update package lists
update_package_lists() {
log "Updating package lists..."
apt-get update >> "${LOG_FILE}" 2>&1
if [ $? -ne 0 ]; then
error_log "Failed to update package lists"
return 1
fi
log "✅ Package lists updated"
return 0
}
# Get list of upgradeable packages (excluding blacklist)
get_upgradeable_packages() {
apt list --upgradable 2>/dev/null | grep -v "^Listing" | grep -vE "${BLACKLIST_PATTERN}" | awk -F/ '{print $1}'
}
# Upgrade security packages only
upgrade_security_only() {
log "Checking for security updates..."
# Install unattended-upgrades if not present
dpkg -l | grep -q unattended-upgrades || apt-get install -y unattended-upgrades >> "${LOG_FILE}" 2>&1
# Run unattended-upgrades in dry-run mode first
unattended-upgrade --dry-run -v >> "${LOG_FILE}" 2>&1
# Actually perform upgrades
unattended-upgrade -v >> "${LOG_FILE}" 2>&1
if [ $? -eq 0 ]; then
log "✅ Security updates applied successfully"
return 0
else
error_log "Failed to apply security updates"
return 1
fi
}
# Upgrade all safe packages (excluding blacklist)
upgrade_safe_packages() {
local packages=$(get_upgradeable_packages)
if [ -z "${packages}" ]; then
log "No packages to upgrade"
return 0
fi
log "Upgradeable packages (excluding blacklist):"
echo "${packages}" | tee -a "${LOG_FILE}"
log "Performing safe upgrade..."
DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
-o Dpkg::Options::="--force-confdef"
-o Dpkg::Options::="--force-confold"
>> "${LOG_FILE}" 2>&1
if [ $? -eq 0 ]; then
log "✅ Packages upgraded successfully"
return 0
else
error_log "Package upgrade failed"
return 1
fi
}
# Clean up unused packages
cleanup_packages() {
log "Cleaning up unused packages..."
apt-get autoremove -y >> "${LOG_FILE}" 2>&1
apt-get autoclean >> "${LOG_FILE}" 2>&1
log "✅ Cleanup completed"
}
# Verify critical services after update
verify_services() {
log "Verifying critical services..."
local failed_services=()
for service in "${CRITICAL_SERVICES[@]}"; do
if systemctl is-active --quiet ${service} 2>/dev/null; then
log "✅ ${service}: active"
else
if systemctl list-unit-files | grep -q "^${service}.service"; then
error_log "${service}: FAILED or inactive"
failed_services+=("${service}")
fi
fi
done
# Check system resources
local disk_usage=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ ${disk_usage} -gt 90 ]; then
error_log "Disk usage critical: ${disk_usage}%"
fi
local mem_available=$(free -m | awk 'NR==2 {print $7}')
if [ ${mem_available} -lt 500 ]; then
error_log "Low memory: ${mem_available}MB available"
fi
# If services failed, attempt restart
if [ ${#failed_services[@]} -gt 0 ]; then
log "Attempting to restart failed services..."
for service in "${failed_services[@]}"; do
systemctl restart ${service} >> "${LOG_FILE}" 2>&1
sleep 3
if systemctl is-active --quiet ${service}; then
log "✅ ${service} restarted successfully"
else
error_log "${service} restart failed"
fi
done
# Send alert if still failing
local still_failed=()
for service in "${failed_services[@]}"; do
if ! systemctl is-active --quiet ${service}; then
still_failed+=("${service}")
fi
done
if [ ${#still_failed[@]} -gt 0 ]; then
send_alert "🚨 Post-Update Alert: Services Failed on $(hostname)"
"Failed services: ${still_failed[*]}nnSee logs:n${LOG_FILE}n${ERROR_LOG}"
return 1
fi
fi
log "✅ All services verified"
return 0
}
# Check if reboot is required
check_reboot_required() {
if [ -f /var/run/reboot-required ]; then
log "⚠️ System reboot required"
log "Packages requiring reboot:"
cat /var/run/reboot-required.pkgs | tee -a "${LOG_FILE}"
send_alert "⚠️ Reboot Required: $(hostname)"
"The following updates require a system reboot:nn$(cat /var/run/reboot-required.pkgs)nnPlease schedule a maintenance window."
return 1
fi
return 0
}
# Generate update summary
generate_summary() {
log "========== Update Summary =========="
log "Hostname: $(hostname)"
log "Date: $(date)"
log "Kernel: $(uname -r)"
log "Uptime: $(uptime -p)"
# Check for pending updates
local updates_available=$(apt list --upgradable 2>/dev/null | grep -c "upgradable")
log "Remaining updates: ${updates_available}"
# Check security updates
local security_updates=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)
if [ ${security_updates} -gt 0 ]; then
log "⚠️ Security updates available: ${security_updates}"
fi
log "===================================="
}
# ========== Main Execution ==========
main() {
log "========== Auto-Update Script Started =========="
# Trap to ensure lock release on exit
trap release_lock EXIT
# Acquire lock
acquire_lock
# Rotate logs if needed
rotate_logs
# Create pre-update snapshot
if ! create_snapshot; then
error_log "Snapshot creation failed. Aborting update."
exit 1
fi
# Update package lists
if ! update_package_lists; then
error_log "Failed to update package lists. Aborting."
exit 1
fi
# Perform updates (choose one strategy)
# Strategy 1: Security updates only (conservative)
if ! upgrade_security_only; then
error_log "Security update failed"
exit 1
fi
# Strategy 2: All safe packages (more aggressive)
# if ! upgrade_safe_packages; then
# error_log "Package upgrade failed"
# exit 1
# fi
# Cleanup
cleanup_packages
# Verify services
if ! verify_services; then
error_log "Service verification failed"
# Don't exit - services may have been restarted
fi
# Check reboot requirement
check_reboot_required
# Generate summary
generate_summary
log "========== Auto-Update Script Completed =========="
}
# Execute main function
main "$@"
Cron 設定
# 編輯 root crontab
sudo crontab -e
# 選項 1:每天凌晨 2:00 執行(推薦)
0 2 * * * /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1
# 選項 2:每週日凌晨 3:00 執行(保守)
0 3 * * 0 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1
# 選項 3:每月第一個週日凌晨 2:00(極度保守)
0 2 1-7 * 0 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1
# 選項 4:工作日每天凌晨 2:00(避開週末)
0 2 * * 1-5 /usr/local/sbin/auto-update.sh >> /var/log/auto-update-cron.log 2>&1
部署腳本
# 設定執行權限
sudo chmod +x /usr/local/sbin/auto-update.sh
# 創建日誌目錄
sudo touch /var/log/auto-update.log
sudo touch /var/log/auto-update-error.log
sudo chmod 640 /var/log/auto-update*.log
# 手動測試執行
sudo /usr/local/sbin/auto-update.sh
# 檢查日誌
sudo tail -f /var/log/auto-update.log
監控 Cron 執行狀態
# 查看 cron 執行歷史
sudo grep "auto-update" /var/log/syslog
# 查看最近的執行結果
sudo tail -100 /var/log/auto-update.log
# 檢查錯誤日誌
sudo cat /var/log/auto-update-error.log
# 使用 journalctl 查看 cron 日誌
sudo journalctl -u cron | grep auto-update
我的建議:根據場景選擇
| 場景 | 推薦方案 | 理由 |
|---|---|---|
| 標準企業生產環境 | systemd + unattended-upgrades | 穩定、可靠、官方支援,適合大規模部署 |
| 高度客製化需求 | Shell + Cron | 完全控制更新流程,整合現有系統 |
| 簡單環境(< 10台) | Shell + Cron | 簡單易懂,快速部署 |
| 關鍵服務 | systemd + unattended-upgrades | 錯誤處理更完善,降低風險 |
| 測試/開發環境 | Shell + Cron | 彈性高,方便測試與調整 |
| Container Host | 兩者皆可 | 依團隊熟悉度選擇 |
最終建議:
- ✅ 生產環境優先使用 systemd + unattended-upgrades:經過充分測試、穩定可靠
- ✅ Shell + Cron 作為補充:用於特殊需求或無法使用 unattended-upgrades 的情況
- ✅ 兩者結合:用 unattended-upgrades 處理安全性更新,用 Shell 腳本處理特殊邏輯(如快照、健康檢查)