diff --git a/CHANGELOG.md b/CHANGELOG.md index d4935ff67..65164cda2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,12 +15,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Breaking Changes +* **playbook:setup_basic**: The `at` role is no longer installed by `setup_basic`, as nothing in the base setup uses `at` anymore. Existing hosts keep it; new installs do not get it. If you rely on `at` / `atd`, install it separately with the `linuxfabrik.lfops.at` role. +* **role:system_update**: Host reboots are now performed at one configurable maintenance window by the new `schedule_reboot` role (see Added). Adjust in your inventory: `system_update__update_time` to `schedule_reboot__reboot_time__group_var` (now a plain time of day, e.g. `'04:00'`), and any `system_update__icinga2_*` reboot-downtime settings to `schedule_reboot__icinga2_*`. Also, in most cases `system_update__update_day` should be used instead of `system_update__notify_and_schedule_on_calendar`. * **role:apache_httpd, role:apache_tomcat, role:mastodon, role:postgresql_server**: Rename tags to the project-wide naming scheme. `apache_httpd:config` becomes `apache_httpd:configure`, and `apache_tomcat:users`, `mastodon:users`, `postgresql_server:users` and `postgresql_server:databases` lose their trailing `s` (`...:user`, `...:database`). Adjust any `--tags` / `--skip-tags` invocations and automation that reference the old tag names. * **role:minio_client, role:objectstore_backup**: Both roles and their playbooks (`playbooks/minio_client.yml`, `playbooks/objectstore_backup.yml`) have been removed, along with the corresponding role blocks in `playbooks/setup_nextcloud.yml` and the `setup_nextcloud__skip_minio_client` / `setup_nextcloud__skip_objectstore_backup` variables. MinIO Server has been archived as no-longer-maintained since February 2026, and we are moving away from using object storage for critical data. Users relying on these roles must replace the MinIO-based object-store backup with their own solution (e.g. `rclone`); the `mc` binary, its config under `/etc/mc/`, the `objectstore-backup` systemd timer/service, and `/usr/local/bin/mc-mirror.sh` are no longer managed by lfops and will remain on existing hosts until removed manually ([#241](https://github.com/Linuxfabrik/lfops/issues/241)). * **role:infomaniak_vm**: Always create a managed port for every entry in `infomaniak_vm__networks`, even when no `fixed_ip` is set. Previously only networks with a `fixed_ip` got a managed port; networks without one relied on OpenStack's auto-created port. To avoid creating unused (but billed) managed ports on VMs provisioned under the old behavior, make sure to manually rename the existing port in OpenStack to match the `port_name`. Note that this port will not survive VM deletion / detachment, since it was automatically created and therefore is owned by OpenStack, not the user. ### Added +* **role:schedule_reboot**: New role. Provides a single, windowed reboot mechanism: a request spool (`/run/schedule-reboot/`), an ad-hoc `schedule-reboot` command, and one actor that performs a single reboot for all pending requests at a configurable window (`schedule_reboot__reboot_time__*`), setting an Icinga downtime around it. Other roles request a reboot instead of rebooting themselves; `system_update` uses it. * **testing**: Add a Molecule-based test framework that runs the playbooks (and through them the roles) against throwaway libvirt/KVM VMs or Podman containers. Scenarios live under `extensions/molecule`; see the Testing section in `CONTRIBUTING.md`. * **role:icinga2_master, role:icingadb, role:icingaweb2, role:icingaweb2_module_reporting, role:icingaweb2_module_x509**: Add explicit Ubuntu variable files, making Ubuntu support visible alongside Debian. The Icinga repository, GPG key and package names were verified on Debian 13 and Ubuntu 24.04. * **role:nextcloud**: Add `meta/argument_specs.yml` declaring the user-facing variables, so role-entry validation catches type mismatches and missing mandatory variables. @@ -36,7 +39,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * **plugin:platform_select**: New filter plugin for selecting a value from a platform-keyed dictionary by OS family / distribution / version. * **role:alternatives**: Support managing `subcommands` (slaves/followers) and the Red Hat-only `family` grouping. The role now also ensures the alternatives tooling is installed (`chkconfig` on RHEL 8, `alternatives` on RHEL 9/10; bundled with `dpkg` on Debian/Ubuntu), and can be included without variables as a no-op. * **role:redis**: Add template for version 8.8 -* **role:system_update**: Add a security lane for Rocky Linux. A second timer (twice a day by default) installs only Rocky Linux security hot-fixes from the dedicated `security` repository (provided by `repo_baseos`) and reboots the host if needed. The reboot time is steered per host group (for example immediately on test hosts, deferred to the evening on production hosts). Enabled by default; a no-op where the `security` repository is not enabled, and can be turned off with `system_update__security_enabled: false`. This keeps critical security fixes flowing daily while the regular update lane stays on its weekly schedule. +* **role:system_update**: Add a security lane for Rocky Linux. A daily timer installs only Rocky Linux security hot-fixes from the dedicated `security` repository (provided by `repo_baseos`) and requests a reboot if needed, which is performed at the host's maintenance window (`schedule_reboot__reboot_time__*`). Enabled by default; a no-op where the `security` repository is not enabled, and can be turned off with `system_update__security_enabled: false`. This keeps critical security fixes flowing daily while the regular update lane stays on its weekly schedule. * **role:mariadb_server**: Add `mariadb_server__cnf_innodb_snapshot_isolation` variable (MariaDB 10.6+), defaulting to `'ON'`. * **role:repo_baseos**: Add the Rocky Linux `security` repository (critical CVE fixes), enabled by default. Opt out per host or group via `repo_baseos__security_repo_enabled__host_var` / `repo_baseos__security_repo_enabled__group_var`. * **role:chromium_headless**: New role. Provides a hardened, socket-activated headless Chromium backend (started on the first request, stopped again after an idle timeout, so it uses no RAM while unused) for tools such as the Icinga Web 2 PDF Export Module. Installs `chromium-headless` from EPEL instead of Google's proprietary repository. @@ -70,6 +73,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed +* **role:mailto_root**: Send the verification mails via `sendmail` (provided by `postfix`) instead of the `mail` (mailx) command, completing the move off `mailx` (see Breaking Changes). The role no longer needs the `mailx` package installed. * **role:icinga2_master, role:icingadb**: Validate the Icinga 2 configuration before restarting the service. A faulty config now fails the playbook run loudly instead of bouncing the daemon into a broken state and leaving Icinga 2 down. * **role:nextcloud**: Automatic app updates are now enabled by default (`nextcloud__timer_app_update_enabled`). The scheduled app update only switches Nextcloud into maintenance mode when an app update is actually pending, so an instance that is already up to date keeps serving requests without interruption. After updating, the recommended database migrations are applied automatically. A failed run no longer leaves the instance stuck in maintenance mode. * **role:clamav**: Now runs on Debian and Ubuntu in addition to Red Hat-family systems, and works on RHEL 10. The role seeds the signature database on first install so the scanner starts reliably, and runs an EICAR self-test (also available on its own via the `clamav:test` tag) that confirms detection actually works. diff --git a/COMPATIBILITY.md b/COMPATIBILITY.md index b01dd2190..8c2dd5cad 100644 --- a/COMPATIBILITY.md +++ b/COMPATIBILITY.md @@ -148,6 +148,7 @@ Which Ansible role is proven to run on which OS? | repo_sury | x | x | - | - | | (x) | (x) | (x) | | | rocketchat | | | x | (x) | (x) | | | | Fedora 35 | | rsyslog | | | x | x | x | | | | | +| schedule_reboot | x | x | x | x | x | (x) | (x) | (x) | | | selinux | (x) | (x) | x | x | x | (x) | (x) | (x) | | | shared | | | | | | | | | controller-side helper, target OS irrelevant | | shell | (x) | (x) | x | x | x | (x) | (x) | (x) | | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0df6df32f..b17b9b6fd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -638,7 +638,7 @@ Make sure to use the following format when passing multiple injections to avoid * Do not use `{{ template_run_date }}` inside the template. It is the date that the template was rendered, which is done during every Ansible run. This means that the task will always be changed, even if nothing else changed in the template, therefore breaking idempotency. * Use the target path for the file in the `template` folder, for example: `templates/etc/httpd/sites-available/default.conf.j2`. This makes it clear what the file is for, and avoids name collisions. * Always use the `.j2` file extension for files in the `template` folder. -* If deploying self-written scripts, copy them to `/usr/local/sbin` (due to SELinux). +* If deploying self-written scripts, copy them to `/usr/local/sbin` (due to SELinux). Internal helper scripts that are only ever run by a systemd unit (not invoked by an admin and not exec'd by a confined domain) MAY instead live in `/usr/local/libexec`. Files there get the `usr_t` type, and the targeted policy lets a root `oneshot` service (which runs in `init_t`) execute them in place via `execute_no_trans`, so there is no AVC denial on RHEL/Rocky 8, 9 and 10. Keep admin-invokable commands in `/usr/local/sbin`, and never put a script a confined domain must exec under `/usr/local/libexec`. * Keep templates as close to the original file as possible. This makes handling of rpmnew/rpmsave files easier. * Add the following task after deploying a file that might get rpmnew or rpmsave files (or their Debian equivalents): ```yaml diff --git a/extensions/molecule/schedule_reboot/no_reboot/converge.yml b/extensions/molecule/schedule_reboot/no_reboot/converge.yml new file mode 100644 index 000000000..0a9ffa4f2 --- /dev/null +++ b/extensions/molecule/schedule_reboot/no_reboot/converge.yml @@ -0,0 +1,5 @@ +# Import the playbook under test by its FQCN, so the test exercises the real +# playbook (and through it, the roles) instead of a copy that could drift. The +# test inputs live in the scenario inventory, the checks in verify.yml. +- name: 'Converge schedule_reboot playbook' + ansible.builtin.import_playbook: 'linuxfabrik.lfops.schedule_reboot' diff --git a/extensions/molecule/schedule_reboot/no_reboot/inventory/group_vars/systems_under_test.yml b/extensions/molecule/schedule_reboot/no_reboot/inventory/group_vars/systems_under_test.yml new file mode 100644 index 000000000..856056a7b --- /dev/null +++ b/extensions/molecule/schedule_reboot/no_reboot/inventory/group_vars/systems_under_test.yml @@ -0,0 +1,15 @@ +# Inputs for the schedule_reboot playbook (postfix + mailto_root + schedule_reboot). + +# mailto_root is mandatory: it needs a from and a list of recipients. A local +# address is enough; the test never sends real mail. +mailto_root__from: 'root@localhost' +mailto_root__to: + - 'root@localhost' + +# postfix needs a relayhost; an unreachable example host is fine, the test does +# not deliver mail. +postfix__relayhost: 'mail.example.com' + +# Reboot window far from the test run, so the timer never fires mid-test. The +# actor is exercised explicitly in verify.yml instead. +schedule_reboot__reboot_time__group_var: '03:00' diff --git a/extensions/molecule/schedule_reboot/no_reboot/inventory/hosts.yml b/extensions/molecule/schedule_reboot/no_reboot/inventory/hosts.yml new file mode 100644 index 000000000..57b1f4103 --- /dev/null +++ b/extensions/molecule/schedule_reboot/no_reboot/inventory/hosts.yml @@ -0,0 +1,18 @@ +# yamllint disable rule:empty-values + +lfops_schedule_reboot: + children: + systems_under_test: + +systems_under_test: + hosts: + debian11-vm: + debian12-vm: + debian13-vm: + rocky8-vm: + rocky9-vm: + rocky10-vm: + ubuntu2004-vm: + ubuntu2204-vm: + ubuntu2404-vm: + ubuntu2604-vm: diff --git a/extensions/molecule/schedule_reboot/no_reboot/molecule.yml b/extensions/molecule/schedule_reboot/no_reboot/molecule.yml new file mode 100644 index 000000000..aac4dd8ca --- /dev/null +++ b/extensions/molecule/schedule_reboot/no_reboot/molecule.yml @@ -0,0 +1,5 @@ +# Non-destructive sub-scenario: deploys the schedule_reboot mechanism and checks +# it is armed (timer, spool, CLI, empty-spool actor no-op) without rebooting. +# Marker only - inherits the VM backend and the full test_sequence (including +# idempotence) from extensions/molecule/config.yml. The reboot sub-scenario does +# the real reboot. diff --git a/extensions/molecule/schedule_reboot/no_reboot/verify.yml b/extensions/molecule/schedule_reboot/no_reboot/verify.yml new file mode 100644 index 000000000..add07057f --- /dev/null +++ b/extensions/molecule/schedule_reboot/no_reboot/verify.yml @@ -0,0 +1,74 @@ +# Verify the schedule_reboot mechanism on the running system: the timer is +# armed, the spool exists, the actor runs as a no-op on an empty spool (which +# also proves /usr/local/libexec/do-reboot executes under systemd and SELinux), +# and the schedule-reboot CLI files a request. No real reboot happens here - +# that is the schedule_reboot/reboot scenario. +- name: 'Verify schedule_reboot is armed and functional' + hosts: 'systems_under_test' + gather_facts: false + tasks: + + - name: 'systemctl is-enabled schedule-reboot.timer' # noqa command-instead-of-module (read-only state query) + ansible.builtin.command: 'systemctl is-enabled schedule-reboot.timer' + register: '__molecule__schedule_reboot_timer_enabled_result' + changed_when: false + failed_when: false + + - name: 'systemctl is-active schedule-reboot.timer' # noqa command-instead-of-module (read-only state query) + ansible.builtin.command: 'systemctl is-active schedule-reboot.timer' + register: '__molecule__schedule_reboot_timer_active_result' + changed_when: false + failed_when: false + + - name: 'Assert that schedule-reboot.timer is enabled and active' + ansible.builtin.assert: + that: + - '__molecule__schedule_reboot_timer_enabled_result["stdout"] == "enabled"' + - '__molecule__schedule_reboot_timer_active_result["stdout"] == "active"' + + - name: 'stat /run/schedule-reboot' + ansible.builtin.stat: + path: '/run/schedule-reboot' + register: '__molecule__spool_stat_result' + + - name: 'Assert that the spool directory exists with mode 0755' + ansible.builtin.assert: + that: + - '__molecule__spool_stat_result["stat"]["isdir"]' + - '__molecule__spool_stat_result["stat"]["mode"] == "0755"' + + # Run the actor against the empty post-converge spool. do-reboot must exit 0 + # without rebooting; that it runs at all proves the usr_t libexec script is + # executable under systemd/SELinux (the FHS-placement concern). + - name: 'systemctl start schedule-reboot.service (empty spool, expected no-op)' # noqa command-instead-of-module (need to run the oneshot actor and read its Result) + ansible.builtin.command: 'systemctl start schedule-reboot.service' + changed_when: false + + - name: 'systemctl show --property=Result schedule-reboot.service' # noqa command-instead-of-module (no module reads a oneshot Result) + ansible.builtin.command: 'systemctl show --property=Result --value schedule-reboot.service' + register: '__molecule__do_reboot_result' + changed_when: false + + - name: 'Assert that do-reboot completed successfully without rebooting' + ansible.builtin.assert: + that: '__molecule__do_reboot_result["stdout"] == "success"' + + # Exercise the request CLI, then remove the request so nothing can trigger a + # reboot afterwards (the timer is at 03:00 and will not fire during the run). + - name: 'schedule-reboot ci-verify hello-from-molecule' + ansible.builtin.command: 'schedule-reboot ci-verify hello-from-molecule' + changed_when: false + + - name: 'slurp /run/schedule-reboot/ci-verify' + ansible.builtin.slurp: + src: '/run/schedule-reboot/ci-verify' + register: '__molecule__ci_verify_request_result' + + - name: 'Assert the request file holds the detail text passed to the CLI' + ansible.builtin.assert: + that: '__molecule__ci_verify_request_result["content"] | ansible.builtin.b64decode == "hello-from-molecule\n"' + + - name: 'rm /run/schedule-reboot/ci-verify' + ansible.builtin.file: + path: '/run/schedule-reboot/ci-verify' + state: 'absent' diff --git a/extensions/molecule/schedule_reboot/reboot/converge.yml b/extensions/molecule/schedule_reboot/reboot/converge.yml new file mode 100644 index 000000000..c851f1941 --- /dev/null +++ b/extensions/molecule/schedule_reboot/reboot/converge.yml @@ -0,0 +1,4 @@ +# Same playbook as the no_reboot sub-scenario; only verify.yml differs (it +# performs a real reboot instead of the non-destructive checks). +- name: 'Converge schedule_reboot playbook' + ansible.builtin.import_playbook: 'linuxfabrik.lfops.schedule_reboot' diff --git a/extensions/molecule/schedule_reboot/reboot/inventory/group_vars/systems_under_test.yml b/extensions/molecule/schedule_reboot/reboot/inventory/group_vars/systems_under_test.yml new file mode 100644 index 000000000..cd2403541 --- /dev/null +++ b/extensions/molecule/schedule_reboot/reboot/inventory/group_vars/systems_under_test.yml @@ -0,0 +1,13 @@ +# Same inputs as the no_reboot sub-scenario. +mailto_root__from: 'root@localhost' +mailto_root__to: + - 'root@localhost' + +postfix__relayhost: 'mail.example.com' + +schedule_reboot__reboot_time__group_var: '03:00' + +# reboot immediately on --now: this scenario asserts the reboot via a boot_id change +# after wait_for_connection, so the default grace sleep would keep the host up past +# the reconnect and make the check flap. The grace period itself is not under test here. +schedule_reboot__reboot_grace_period: 0 diff --git a/extensions/molecule/schedule_reboot/reboot/inventory/hosts.yml b/extensions/molecule/schedule_reboot/reboot/inventory/hosts.yml new file mode 100644 index 000000000..57b1f4103 --- /dev/null +++ b/extensions/molecule/schedule_reboot/reboot/inventory/hosts.yml @@ -0,0 +1,18 @@ +# yamllint disable rule:empty-values + +lfops_schedule_reboot: + children: + systems_under_test: + +systems_under_test: + hosts: + debian11-vm: + debian12-vm: + debian13-vm: + rocky8-vm: + rocky9-vm: + rocky10-vm: + ubuntu2004-vm: + ubuntu2204-vm: + ubuntu2404-vm: + ubuntu2604-vm: diff --git a/extensions/molecule/schedule_reboot/reboot/molecule.yml b/extensions/molecule/schedule_reboot/reboot/molecule.yml new file mode 100644 index 000000000..8cf6a2207 --- /dev/null +++ b/extensions/molecule/schedule_reboot/reboot/molecule.yml @@ -0,0 +1,12 @@ +# Opt-in, DESTRUCTIVE sub-scenario: it triggers a real reboot to prove the +# do-reboot actor reboots the host and the tmpfs spool clears. The reduced +# test_sequence drops the idempotence loop and the trailing verify, so the +# reboot happens exactly once per run. +scenario: + test_sequence: + - 'dependency' + - 'create' + - 'prepare' + - 'converge' + - 'verify' + - 'destroy' diff --git a/extensions/molecule/schedule_reboot/reboot/verify.yml b/extensions/molecule/schedule_reboot/reboot/verify.yml new file mode 100644 index 000000000..6d3ff8d13 --- /dev/null +++ b/extensions/molecule/schedule_reboot/reboot/verify.yml @@ -0,0 +1,44 @@ +# Destructive verify: actually reboot the host through the CLI and confirm it +# came back (boot_id changed) with an empty spool (tmpfs cleared on reboot). +- name: 'Verify schedule_reboot performs a real reboot' + hosts: 'systems_under_test' + gather_facts: false + tasks: + + - name: 'slurp /proc/sys/kernel/random/boot_id (before)' + ansible.builtin.slurp: + src: '/proc/sys/kernel/random/boot_id' + register: '__molecule__boot_id_before_result' + + # Fire and forget: schedule-reboot --now starts schedule-reboot.service, + # which reboots the host and drops this SSH connection. async + poll: 0 lets + # Ansible move on instead of waiting on the dying connection. + - name: 'schedule-reboot --now ci-reboot molecule-reboot-test' + ansible.builtin.command: 'schedule-reboot --now ci-reboot molecule-reboot-test' + async: 1 + poll: 0 # fire and forget: the reboot drops the connection, do not wait on it + changed_when: false + + - name: 'Wait for the host to come back after the reboot' + ansible.builtin.wait_for_connection: + delay: 15 + sleep: 5 + timeout: 300 + + - name: 'slurp /proc/sys/kernel/random/boot_id (after)' + ansible.builtin.slurp: + src: '/proc/sys/kernel/random/boot_id' + register: '__molecule__boot_id_after_result' + + - name: 'Assert the boot_id changed, proving a real reboot happened' + ansible.builtin.assert: + that: '__molecule__boot_id_after_result["content"] != __molecule__boot_id_before_result["content"]' + + - name: 'find /run/schedule-reboot' + ansible.builtin.find: + paths: '/run/schedule-reboot' + register: '__molecule__spool_after_result' + + - name: 'Assert the spool is empty (tmpfs cleared by the reboot)' + ansible.builtin.assert: + that: '__molecule__spool_after_result["matched"] == 0' diff --git a/extensions/molecule/system_update/converge.yml b/extensions/molecule/system_update/converge.yml new file mode 100644 index 000000000..eacdf8624 --- /dev/null +++ b/extensions/molecule/system_update/converge.yml @@ -0,0 +1,5 @@ +# Import the playbook under test by its FQCN. system_update.yml also pulls in +# postfix, mailto_root and schedule_reboot, so the converge exercises the whole +# update + reboot chain. +- name: 'Converge system_update playbook' + ansible.builtin.import_playbook: 'linuxfabrik.lfops.system_update' diff --git a/extensions/molecule/system_update/inventory/group_vars/systems_under_test.yml b/extensions/molecule/system_update/inventory/group_vars/systems_under_test.yml new file mode 100644 index 000000000..0182e39bd --- /dev/null +++ b/extensions/molecule/system_update/inventory/group_vars/systems_under_test.yml @@ -0,0 +1,15 @@ +# Inputs for the system_update playbook (yum_utils + postfix + mailto_root + +# schedule_reboot + system_update). + +# mailto_root is mandatory: a from and a list of recipients. A local address is +# enough; the test never sends real mail. +mailto_root__from: 'root@localhost' +mailto_root__to: + - 'root@localhost' + +# postfix needs a relayhost; an unreachable example host is fine. +postfix__relayhost: 'mail.example.com' + +# Reboot window far from the test run, so neither the update nor the reboot +# timer fires mid-test. +schedule_reboot__reboot_time__group_var: '03:00' diff --git a/extensions/molecule/system_update/inventory/hosts.yml b/extensions/molecule/system_update/inventory/hosts.yml new file mode 100644 index 000000000..0d803be7a --- /dev/null +++ b/extensions/molecule/system_update/inventory/hosts.yml @@ -0,0 +1,21 @@ +# yamllint disable rule:empty-values + +# Map the playbook's target group (playbooks/system_update.yml: hosts: +# lfops_system_update) onto the shared systems_under_test host set. Trim this +# (or use LFOPS_TEST_TARGETS at runtime) to test against fewer hosts. +lfops_system_update: + children: + systems_under_test: + +systems_under_test: + hosts: + debian11-vm: + debian12-vm: + debian13-vm: + rocky8-vm: + rocky9-vm: + rocky10-vm: + ubuntu2004-vm: + ubuntu2204-vm: + ubuntu2404-vm: + ubuntu2604-vm: diff --git a/extensions/molecule/system_update/molecule.yml b/extensions/molecule/system_update/molecule.yml new file mode 100644 index 000000000..a2a970c2a --- /dev/null +++ b/extensions/molecule/system_update/molecule.yml @@ -0,0 +1,2 @@ +# Molecule scenario marker. Inherits the VM backend and the full test_sequence +# from extensions/molecule/config.yml. diff --git a/extensions/molecule/system_update/verify.yml b/extensions/molecule/system_update/verify.yml new file mode 100644 index 000000000..885998063 --- /dev/null +++ b/extensions/molecule/system_update/verify.yml @@ -0,0 +1,86 @@ +# Verify system_update on the running system: both regular-lane timers and the +# schedule-reboot timer it pulls in are armed; on Rocky the security lane exists +# and is a clean no-op without the security repo; off Rocky the security lane is +# absent. sendmail (the transport the scripts now use) is available. +# +# gather_facts is on here because the security-lane checks branch on +# ansible_facts["distribution"]. +- name: 'Verify system_update timers and the security lane' + hosts: 'systems_under_test' + gather_facts: true + tasks: + + - name: 'systemctl is-enabled the regular-lane and reboot timers' # noqa command-instead-of-module (read-only state query) + ansible.builtin.command: 'systemctl is-enabled {{ item }}' + loop: + - 'notify-and-schedule.timer' + - 'update-and-reboot.timer' + - 'schedule-reboot.timer' + register: '__molecule__update_timers_enabled_result' + changed_when: false + failed_when: false + + - name: 'Assert the regular-lane and reboot timers are enabled' + ansible.builtin.assert: + that: 'item["stdout"] == "enabled"' + loop: '{{ __molecule__update_timers_enabled_result["results"] }}' + loop_control: + label: '{{ item["item"] }}' + + - name: 'systemctl is-active security-update.timer (Rocky only)' # noqa command-instead-of-module (read-only state query) + ansible.builtin.command: 'systemctl is-active security-update.timer' + register: '__molecule__security_timer_active_result' + changed_when: false + failed_when: false + when: 'ansible_facts["distribution"] == "Rocky"' + + - name: 'Assert the security lane is armed on Rocky' + ansible.builtin.assert: + that: '__molecule__security_timer_active_result["stdout"] == "active"' + when: 'ansible_facts["distribution"] == "Rocky"' + + - name: 'stat /etc/systemd/system/security-update.timer (non-Rocky)' + ansible.builtin.stat: + path: '/etc/systemd/system/security-update.timer' + register: '__molecule__security_timer_stat_result' + when: 'ansible_facts["distribution"] != "Rocky"' + + - name: 'Assert the security lane is absent off Rocky' + ansible.builtin.assert: + that: 'not __molecule__security_timer_stat_result["stat"]["exists"]' + when: 'ansible_facts["distribution"] != "Rocky"' + + # Without the security repo (system_update.yml does not pull in repo_baseos), + # security-update must exit 0 and request no reboot. + - name: 'systemctl start security-update.service (Rocky no-op)' # noqa command-instead-of-module (need to run the oneshot and read its Result) + ansible.builtin.command: 'systemctl start security-update.service' + changed_when: false + when: 'ansible_facts["distribution"] == "Rocky"' + + - name: 'systemctl show --property=Result security-update.service' # noqa command-instead-of-module (no module reads a oneshot Result) + ansible.builtin.command: 'systemctl show --property=Result --value security-update.service' + register: '__molecule__security_update_result' + changed_when: false + when: 'ansible_facts["distribution"] == "Rocky"' + + - name: 'stat /run/schedule-reboot/security_update' + ansible.builtin.stat: + path: '/run/schedule-reboot/security_update' + register: '__molecule__security_request_stat_result' + when: 'ansible_facts["distribution"] == "Rocky"' + + - name: 'Assert the security lane ran clean and requested no reboot' + ansible.builtin.assert: + that: + - '__molecule__security_update_result["stdout"] == "success"' + - 'not __molecule__security_request_stat_result["stat"]["exists"]' + when: 'ansible_facts["distribution"] == "Rocky"' + + - name: 'stat /usr/sbin/sendmail' + ansible.builtin.stat: + path: '/usr/sbin/sendmail' + register: '__molecule__sendmail_stat_result' + + - name: 'Assert sendmail is available for the update notifications' + ansible.builtin.assert: + that: '__molecule__sendmail_stat_result["stat"]["exists"]' diff --git a/playbooks/README.md b/playbooks/README.md index a34ccf51e..b75d871c2 100644 --- a/playbooks/README.md +++ b/playbooks/README.md @@ -989,6 +989,15 @@ Calls the following roles (in order): * [rsyslog](https://github.com/Linuxfabrik/lfops/tree/main/roles/rsyslog) +## schedule_reboot.yml + +Calls the following roles (in order): + +* [postfix](https://github.com/Linuxfabrik/lfops/tree/main/roles/postfix): `schedule_reboot__skip_postfix` +* [mailto_root](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailto_root): `schedule_reboot__skip_mailto_root` +* [schedule_reboot](https://github.com/Linuxfabrik/lfops/tree/main/roles/schedule_reboot) + + ## selinux.yml Calls the following roles (in order): @@ -1024,7 +1033,6 @@ Calls the following roles (in order): * [glances](https://github.com/Linuxfabrik/lfops/tree/main/roles/glances): `setup_basic__skip_glances` * [tools](https://github.com/Linuxfabrik/lfops/tree/main/roles/tools): `setup_basic__skip_tools` * [tmux](https://github.com/Linuxfabrik/lfops/tree/main/roles/tmux): `setup_basic__skip_tmux` -* [at](https://github.com/Linuxfabrik/lfops/tree/main/roles/at): `setup_basic__skip_at` * [yum_utils](https://github.com/Linuxfabrik/lfops/tree/main/roles/yum_utils): `setup_basic__skip_yum_utils` * [lvm](https://github.com/Linuxfabrik/lfops/tree/main/roles/lvm): `setup_basic__skip_lvm` * [sshd](https://github.com/Linuxfabrik/lfops/tree/main/roles/sshd): `setup_basic__skip_sshd` @@ -1033,6 +1041,7 @@ Calls the following roles (in order): * [mailx](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailx): `setup_basic__skip_mailx` * [postfix](https://github.com/Linuxfabrik/lfops/tree/main/roles/postfix): `setup_basic__skip_postfix` * [mailto_root](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailto_root): `setup_basic__skip_mailto_root` +* [schedule_reboot](https://github.com/Linuxfabrik/lfops/tree/main/roles/schedule_reboot): `setup_basic__skip_schedule_reboot` * [system_update](https://github.com/Linuxfabrik/lfops/tree/main/roles/system_update): `setup_basic__skip_system_update` * [python_venv](https://github.com/Linuxfabrik/lfops/tree/main/roles/python_venv): `setup_basic__skip_python_venv` * [duplicity](https://github.com/Linuxfabrik/lfops/tree/main/roles/duplicity): `setup_basic__skip_duplicity` @@ -1319,10 +1328,9 @@ Calls the following roles (in order): Calls the following roles (in order): * [yum_utils](https://github.com/Linuxfabrik/lfops/tree/main/roles/yum_utils): `system_update__skip_yum_utils` -* [at](https://github.com/Linuxfabrik/lfops/tree/main/roles/at) -* [mailx](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailx) * [postfix](https://github.com/Linuxfabrik/lfops/tree/main/roles/postfix): `system_update__skip_postfix` * [mailto_root](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailto_root): `system_update__skip_mailto_root` +* [schedule_reboot](https://github.com/Linuxfabrik/lfops/tree/main/roles/schedule_reboot) * [system_update](https://github.com/Linuxfabrik/lfops/tree/main/roles/system_update) diff --git a/playbooks/all.yml b/playbooks/all.yml index 757b55240..0cbc51067 100644 --- a/playbooks/all.yml +++ b/playbooks/all.yml @@ -120,6 +120,7 @@ - import_playbook: 'repo_rpmfusion.yml' - import_playbook: 'repo_sury.yml' - import_playbook: 'rsyslog.yml' +- import_playbook: 'schedule_reboot.yml' - import_playbook: 'selinux.yml' - import_playbook: 'setup_basic.yml' - import_playbook: 'setup_grav.yml' diff --git a/playbooks/schedule_reboot.yml b/playbooks/schedule_reboot.yml new file mode 100644 index 000000000..9dcde56da --- /dev/null +++ b/playbooks/schedule_reboot.yml @@ -0,0 +1,43 @@ +- name: 'Playbook linuxfabrik.lfops.schedule_reboot' + hosts: + - 'lfops_schedule_reboot' + + pre_tasks: + - ansible.builtin.import_role: + name: 'shared' + tasks_from: 'log-start.yml' + tags: + - 'always' + + - ansible.builtin.import_role: + name: 'shared' + tasks_from: 'global-variables.yml' + tags: + - 'always' + + + roles: + + - role: 'linuxfabrik.lfops.postfix' + postfix__aliases__dependent_var: '{{ + mailto_root__postfix__aliases__dependent_var + }}' + postfix__sender_canonicals__dependent_var: '{{ + mailto_root__postfix__sender_canonicals__dependent_var + }}' + when: + - 'not schedule_reboot__skip_postfix | d(false)' + + - role: 'linuxfabrik.lfops.mailto_root' + when: + - 'not schedule_reboot__skip_mailto_root | d(false)' + + - role: 'linuxfabrik.lfops.schedule_reboot' + + + post_tasks: + - ansible.builtin.import_role: + name: 'shared' + tasks_from: 'log-end.yml' + tags: + - 'always' diff --git a/playbooks/setup_basic.yml b/playbooks/setup_basic.yml index a02fd4622..1d5e78479 100644 --- a/playbooks/setup_basic.yml +++ b/playbooks/setup_basic.yml @@ -124,10 +124,6 @@ when: - 'not setup_basic__skip_tmux | d(false)' - - role: 'linuxfabrik.lfops.at' - when: - - 'not setup_basic__skip_at | d(false)' - - role: 'linuxfabrik.lfops.yum_utils' when: - 'not setup_basic__skip_yum_utils | d(false)' @@ -176,6 +172,10 @@ # === Automatic updates === + - role: 'linuxfabrik.lfops.schedule_reboot' + when: + - 'not setup_basic__skip_schedule_reboot | d(false)' + - role: 'linuxfabrik.lfops.system_update' when: - 'not setup_basic__skip_system_update | d(false)' diff --git a/playbooks/system_update.yml b/playbooks/system_update.yml index 249267579..de78cb5b5 100644 --- a/playbooks/system_update.yml +++ b/playbooks/system_update.yml @@ -23,9 +23,6 @@ - 'ansible_facts["os_family"] == "RedHat"' - 'not system_update__skip_yum_utils | d(false)' - - role: 'linuxfabrik.lfops.at' - - role: 'linuxfabrik.lfops.mailx' - - role: 'linuxfabrik.lfops.postfix' postfix__aliases__dependent_var: '{{ mailto_root__postfix__aliases__dependent_var @@ -40,6 +37,8 @@ when: - 'not system_update__skip_mailto_root | d(false)' + - role: 'linuxfabrik.lfops.schedule_reboot' + - role: 'linuxfabrik.lfops.system_update' diff --git a/roles/mailto_root/README.md b/roles/mailto_root/README.md index 3f2cd5239..5b5c2e899 100644 --- a/roles/mailto_root/README.md +++ b/roles/mailto_root/README.md @@ -10,8 +10,7 @@ This role enables relaying all mail that is sent to the root user (or other serv Any [LFOps playbook](https://github.com/Linuxfabrik/lfops/blob/main/playbooks/README.md) that installs this role runs these for you. Optional ones can be disabled via the playbook's skip variables. -* postfix must be installed and configured (role: [linuxfabrik.lfops.postfix](https://github.com/Linuxfabrik/lfops/tree/main/roles/postfix)). -* mailx must be installed (role: [linuxfabrik.lfops.mailx](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailx)). +* postfix must be installed and configured; it provides the `sendmail` interface used to send the relayed mail (role: [linuxfabrik.lfops.postfix](https://github.com/Linuxfabrik/lfops/tree/main/roles/postfix)). ## Tags diff --git a/roles/mailto_root/tasks/main.yml b/roles/mailto_root/tasks/main.yml index 442eb6026..9ef4a19c9 100644 --- a/roles/mailto_root/tasks/main.yml +++ b/roles/mailto_root/tasks/main.yml @@ -1,15 +1,34 @@ - block: - - name: 'Set platform/version specific variables' - ansible.builtin.import_role: - name: 'shared' - tasks_from: 'platform-variables.yml' + # send via sendmail (provided by postfix), so the role needs no mailx package + # and one invocation works across distros (mailx's -r / -a "From:" split does + # not). -t reads the recipient from the To: header. changed_when: false because + # sending a verification mail does not change managed system state. + - name: 'Send test mail to internal root (should be delivered to {{ mailto_root__to }})' + ansible.builtin.shell: | + set -o pipefail + { + printf 'From: %s\n' '{{ mailto_root__from }}' + printf 'To: root\n' + printf 'Subject: Test from %s to root\n' "$(hostname)" + printf '\nTestmail\n' + } | /usr/sbin/sendmail -oi -t -f '{{ mailto_root__from }}' + args: + executable: '/usr/bin/bash' + changed_when: false - - name: 'Send test mail to internal root (should be delivered to {{ mailto_root__to }})' # noqa no-changed-when - ansible.builtin.shell: 'echo "Testmail" | mail -s "Test from $(hostname) to root" root' - - - name: 'Send test mail to {{ mailto_root__to[0] }}' # noqa no-changed-when - ansible.builtin.shell: 'echo "Testmail" | mail -s "Test from $(hostname) to {{ mailto_root__to[0] }}" {{ mailto_root__from_command }} {{ mailto_root__to[0] }}' + - name: 'Send test mail to {{ mailto_root__to[0] }}' + ansible.builtin.shell: | + set -o pipefail + { + printf 'From: %s\n' '{{ mailto_root__from }}' + printf 'To: %s\n' '{{ mailto_root__to[0] }}' + printf 'Subject: Test from %s to %s\n' "$(hostname)" '{{ mailto_root__to[0] }}' + printf '\nTestmail\n' + } | /usr/sbin/sendmail -oi -t -f '{{ mailto_root__from }}' + args: + executable: '/usr/bin/bash' + changed_when: false tags: - 'mailto_root' diff --git a/roles/mailto_root/vars/Debian.yml b/roles/mailto_root/vars/Debian.yml deleted file mode 100644 index e3f120a9f..000000000 --- a/roles/mailto_root/vars/Debian.yml +++ /dev/null @@ -1 +0,0 @@ -mailto_root__from_command: '-a "From: {{ mailto_root__from }}"' diff --git a/roles/mailto_root/vars/RedHat.yml b/roles/mailto_root/vars/RedHat.yml deleted file mode 100644 index be03f6f52..000000000 --- a/roles/mailto_root/vars/RedHat.yml +++ /dev/null @@ -1 +0,0 @@ -mailto_root__from_command: '-r {{ mailto_root__from }}' diff --git a/roles/schedule_reboot/README.md b/roles/schedule_reboot/README.md new file mode 100644 index 000000000..f2c6b1603 --- /dev/null +++ b/roles/schedule_reboot/README.md @@ -0,0 +1,172 @@ +# Ansible Role linuxfabrik.lfops.schedule_reboot + +This role provides a single, windowed reboot mechanism for a host. A reboot is requested by dropping a file into a spool directory (`/run/schedule-reboot/`), and one `do-reboot` actor, triggered by a systemd timer at a configurable maintenance window, performs a single reboot for all pending requests. An ad-hoc `schedule-reboot` command is available to request a reboot after a manual change. + +Other roles reuse this mechanism instead of rebooting themselves: they drop a request file and let this role reboot the host once, at the window. The [system_update](https://github.com/Linuxfabrik/lfops/tree/main/roles/system_update) role works this way. + + +*Available in the next LFOps release.* + + +## How the Role Behaves + +* **One reboot window.** `schedule_reboot__reboot_time__*` sets a single time of day at which the actor runs. Steer it per inventory group, for example an earlier window for infrastructure hosts so they reboot before the rest. +* **Reboots are modelled as state.** A pending reboot is a file in `/run/schedule-reboot/`; the file name is a category and its content is included in the notification mail. Several pending reasons coalesce into a single reboot. The spool lives on tmpfs, so it clears on the reboot itself. +* **Producers order themselves before the actor.** A role that runs work at the same window (such as a system update) declares its unit `Before=schedule-reboot.service` (order-only, no `Wants=`/`Requires=`). systemd then runs the reboot only after that work finishes; on a window where no producer runs, the actor runs alone and reboots only if a request is pending. +* **An Icinga downtime is set around the reboot.** When `schedule_reboot__icinga2_api_user_login` is set, the actor schedules a short host downtime before rebooting, so the reboot does not raise alerts. Without it, the reboot happens without a downtime. + +To request a reboot from a script or by hand: + +```bash +schedule-reboot [detail ...] # request a reboot at the next window +schedule-reboot --now [detail ...] # reboot as soon as any running producer finishes +``` + +To request a reboot from another role's deployed script, write a file directly: + +```bash +printf '%s\n' "kernel updated" > /run/schedule-reboot/my_role +``` + + +## Dependent Roles + +Any [LFOps playbook](https://github.com/Linuxfabrik/lfops/blob/main/playbooks/README.md) that installs this role runs these for you. Optional ones can be disabled via the playbook's skip variables. + +* Optional: the root mail aliases are configured (role: [linuxfabrik.lfops.mailto_root](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailto_root)). +* Optional: postfix provides the `sendmail` interface and mail relay used for the reboot notification (role: [linuxfabrik.lfops.postfix](https://github.com/Linuxfabrik/lfops/tree/main/roles/postfix)). + + +## Tags + +`schedule_reboot` + +* Deploys the `schedule-reboot` CLI, the `do-reboot` actor, the systemd timer/service, and the `/run/schedule-reboot` spool directory. +* Triggers: none. + +`schedule_reboot:state` + +* Determines whether the schedule-reboot timer is enabled. +* Triggers: none. + + +## Optional Role Variables + +`schedule_reboot__enabled` + +* Whether the schedule-reboot timer is enabled at boot, analogous to `systemctl enable / disable --now`. Disabling it stops scheduled reboots; `schedule-reboot --now` still works. +* Type: Bool. +* Default: `true` + +`schedule_reboot__icinga2_api_url` + +* The URL of the Icinga2 API (usually on the Icinga2 Master). Used to set a downtime for the host and all its services around the reboot. +* Type: String. +* Default: `'https://{{ icinga2_agent__icinga2_master_host | d("") }}:{{ icinga2_agent__icinga2_master_port | d(5665) }}'` + +`schedule_reboot__icinga2_api_user_login` + +* The Icinga2 API user used to set the downtime around the reboot. When unset, no downtime is scheduled. +* Type: Dictionary. +* Default: unset +* Subkeys: + + * `username`: + + * Mandatory. Username. + * Type: String. + + * `password`: + + * Mandatory. Password. + * Type: String. + +`schedule_reboot__icinga2_hostname` + +* The hostname of the Icinga2 host on which the downtime should be set. +* Type: String. +* Default: `'{{ ansible_facts["nodename"] }}'` + +`schedule_reboot__mail_from` + +* The email sender account, used as the "from" address for the reboot notification. +* Type: String. +* Default: `'{{ mailto_root__from }}'` + +`schedule_reboot__mail_recipients` + +* A list of email recipients for the reboot notification. +* Type: List of strings. +* Default: `'{{ mailto_root__to }}'` + +`schedule_reboot__mail_subject_hostname` + +* String used as the hostname in the mail subject. You can use `$()` to call bash code. +* Type: String. +* Default: `'$(hostname --short)'` + +`schedule_reboot__mail_subject_prefix` + +* A prefix shown in front of the hostname in the mail subject. Can be used to separate servers by environment or customer. +* Type: String. +* Default: `''` + +`schedule_reboot__on_calendar` + +* When the reboot actor runs. It reboots only if a request is pending in `/run/schedule-reboot/`. Defaults to daily at the reboot window. Have a look at [systemd.time(7)](https://www.freedesktop.org/software/systemd/man/systemd.time.html) for the format. +* Type: String. +* Default: `'*-*-* {{ schedule_reboot__reboot_time__combined_var }}'` + +`schedule_reboot__reboot_grace_period` + +* The number of seconds to wait after sending the reboot notification before the host actually reboots. This gives the notification mail time to leave the local mail queue (which is persistent, so the mail is never lost, but otherwise the heads-up can arrive after the host is already back). The host keeps serving during the wait. Set to `0` to reboot immediately. +* Type: Integer. +* Default: `60` + +`schedule_reboot__reboot_time__host_var` / `schedule_reboot__reboot_time__group_var` + +* The reboot window: the time of day at which the actor (and the producers ordered before it) run. Steer it per inventory group. Have a look at [systemd.time(7)](https://www.freedesktop.org/software/systemd/man/systemd.time.html) for the format. +* Type: String. +* Default: `'04:00'` + +`schedule_reboot__rocketchat_msg_suffix` + +* A suffix to the Rocket.Chat notification. This can be used to mention other users. +* Type: String. +* Default: `''` + +`schedule_reboot__rocketchat_url` + +* The URL to a Rocket.Chat server to send the reboot notification to. +* Type: String. +* Default: unset + +Example: +```yaml +# optional +schedule_reboot__enabled: true +schedule_reboot__icinga2_api_url: 'https://icinga.example.com:5665' +schedule_reboot__icinga2_api_user_login: + username: 'downtime-user' + password: 'linuxfabrik' +schedule_reboot__icinga2_hostname: 'myhost.example.com' +schedule_reboot__mail_from: 'noreply@example.com' +schedule_reboot__mail_recipients: + - 'info@example.com' + - 'support@example.com' +schedule_reboot__mail_subject_prefix: '001-' +schedule_reboot__reboot_grace_period: 60 +schedule_reboot__reboot_time__group_var: '19:00' +schedule_reboot__rocketchat_msg_suffix: '@administrator' +schedule_reboot__rocketchat_url: 'https://chat.example.com/hooks/abcd1234' +``` + + +## License + +[The Unlicense](https://unlicense.org/) + + +## Author Information + +[Linuxfabrik GmbH, Zurich](https://www.linuxfabrik.ch) diff --git a/roles/schedule_reboot/defaults/main.yml b/roles/schedule_reboot/defaults/main.yml new file mode 100644 index 000000000..376d3de5f --- /dev/null +++ b/roles/schedule_reboot/defaults/main.yml @@ -0,0 +1,20 @@ +schedule_reboot__enabled: true +schedule_reboot__icinga2_api_url: 'https://{{ icinga2_agent__icinga2_master_host | d("") }}:{{ icinga2_agent__icinga2_master_port | d(5665) }}' +schedule_reboot__icinga2_hostname: '{{ ansible_facts["nodename"] }}' +schedule_reboot__mail_from: '{{ mailto_root__from }}' +schedule_reboot__mail_recipients: '{{ mailto_root__to }}' +schedule_reboot__mail_subject_hostname: '$(hostname --short)' +schedule_reboot__mail_subject_prefix: '' +schedule_reboot__on_calendar: '*-*-* {{ schedule_reboot__reboot_time__combined_var }}' +schedule_reboot__reboot_grace_period: 60 +schedule_reboot__reboot_time__dependent_var: '' +schedule_reboot__reboot_time__group_var: '' +schedule_reboot__reboot_time__host_var: '' +schedule_reboot__reboot_time__role_var: '04:00' +schedule_reboot__reboot_time__combined_var: '{{ + schedule_reboot__reboot_time__host_var if (schedule_reboot__reboot_time__host_var | string | length) else + schedule_reboot__reboot_time__group_var if (schedule_reboot__reboot_time__group_var | string | length) else + schedule_reboot__reboot_time__dependent_var if (schedule_reboot__reboot_time__dependent_var | string | length) else + schedule_reboot__reboot_time__role_var + }}' +schedule_reboot__rocketchat_msg_suffix: '' diff --git a/roles/schedule_reboot/meta/argument_specs.yml b/roles/schedule_reboot/meta/argument_specs.yml new file mode 100644 index 000000000..b642cc680 --- /dev/null +++ b/roles/schedule_reboot/meta/argument_specs.yml @@ -0,0 +1,86 @@ +argument_specs: + main: + options: + + schedule_reboot__enabled: + type: 'bool' + required: false + default: true + description: 'Whether the schedule-reboot timer is enabled at boot.' + + schedule_reboot__icinga2_api_url: + type: 'str' + required: false + description: 'The URL of the Icinga2 API used to set a downtime around the reboot.' + + schedule_reboot__icinga2_api_user_login: + type: 'dict' + required: false + description: 'The Icinga2 API user (username/password) used to set the downtime. Downtime is skipped when unset.' + + schedule_reboot__icinga2_hostname: + type: 'str' + required: false + description: 'The Icinga2 host name on which the downtime is set.' + + schedule_reboot__mail_from: + type: 'str' + required: false + description: 'The "from" address for the reboot notification.' + + schedule_reboot__mail_recipients: + type: 'raw' + required: false + description: 'The recipients of the reboot notification.' + + schedule_reboot__mail_subject_hostname: + type: 'str' + required: false + default: '$(hostname --short)' + description: 'The hostname string used in the mail subject. May contain $() shell code.' + + schedule_reboot__mail_subject_prefix: + type: 'str' + required: false + default: '' + description: 'A prefix shown in front of the hostname in the mail subject.' + + schedule_reboot__on_calendar: + type: 'str' + required: false + description: 'systemd OnCalendar for the reboot actor. Defaults to daily at the reboot window.' + + schedule_reboot__reboot_grace_period: + type: 'int' + required: false + default: 60 + description: 'Seconds to wait after sending the reboot notification before rebooting, so the mail can leave the queue. 0 disables.' + + schedule_reboot__reboot_time__dependent_var: + type: 'str' + required: false + default: '' + description: 'The reboot window (time of day). Dependent-role injection.' + + schedule_reboot__reboot_time__group_var: + type: 'str' + required: false + default: '' + description: 'The reboot window (time of day). Group-level override.' + + schedule_reboot__reboot_time__host_var: + type: 'str' + required: false + default: '' + description: 'The reboot window (time of day). Host-level override.' + + schedule_reboot__rocketchat_msg_suffix: + type: 'str' + required: false + default: '' + description: 'A suffix to the Rocket.Chat notification, for example to mention users.' + + schedule_reboot__rocketchat_url: + type: 'str' + required: false + description: 'The URL of a Rocket.Chat server to send the reboot notification to.' diff --git a/roles/schedule_reboot/tasks/main.yml b/roles/schedule_reboot/tasks/main.yml new file mode 100644 index 000000000..c76fda189 --- /dev/null +++ b/roles/schedule_reboot/tasks/main.yml @@ -0,0 +1,79 @@ +- block: + + - name: 'Deploy /usr/local/libexec/do-reboot' + ansible.builtin.template: + backup: true + src: 'usr/local/libexec/do-reboot.j2' + dest: '/usr/local/libexec/do-reboot' + owner: 'root' + group: 'root' + mode: 0o744 + + - name: 'Deploy /usr/local/sbin/schedule-reboot' + ansible.builtin.template: + backup: true + src: 'usr/local/sbin/schedule-reboot.j2' + dest: '/usr/local/sbin/schedule-reboot' + owner: 'root' + group: 'root' + mode: 0o744 + + - name: 'Deploy /etc/systemd/system/schedule-reboot.service' + ansible.builtin.template: + backup: true + src: 'etc/systemd/system/schedule-reboot.service.j2' + dest: '/etc/systemd/system/schedule-reboot.service' + owner: 'root' + group: 'root' + mode: 0o644 + register: '__schedule_reboot__service_result' + + - name: 'Deploy /etc/systemd/system/schedule-reboot.timer' + ansible.builtin.template: + backup: true + src: 'etc/systemd/system/schedule-reboot.timer.j2' + dest: '/etc/systemd/system/schedule-reboot.timer' + owner: 'root' + group: 'root' + mode: 0o644 + register: '__schedule_reboot__timer_result' + + - name: 'Deploy /etc/tmpfiles.d/schedule-reboot.conf' + ansible.builtin.template: + backup: true + src: 'etc/tmpfiles.d/schedule-reboot.conf.j2' + dest: '/etc/tmpfiles.d/schedule-reboot.conf' + owner: 'root' + group: 'root' + mode: 0o644 + register: '__schedule_reboot__tmpfiles_result' + + - name: 'systemd-tmpfiles --create /etc/tmpfiles.d/schedule-reboot.conf' # noqa no-handler no-changed-when (no easy way to detect changes) + ansible.builtin.command: 'systemd-tmpfiles --create /etc/tmpfiles.d/schedule-reboot.conf' + when: + - '__schedule_reboot__tmpfiles_result is changed' + + tags: + - 'schedule_reboot' + + +- block: + + # run daemon-reload as a regular task (not a handler) before the state task, so the + # enable/start acts on the freshly deployed unit. gated on any unit-file change. + - name: 'systemctl daemon-reload' + ansible.builtin.systemd: + daemon_reload: true + when: + - '__schedule_reboot__service_result is changed or + __schedule_reboot__timer_result is changed' + + - name: 'systemctl {{ schedule_reboot__enabled | bool | ternary("enable", "disable") }} schedule-reboot.timer --now' + ansible.builtin.systemd: + name: 'schedule-reboot.timer' + state: '{{ schedule_reboot__enabled | bool | ternary("started", "stopped") }}' + enabled: '{{ schedule_reboot__enabled }}' + + tags: + - 'schedule_reboot' + - 'schedule_reboot:state' diff --git a/roles/schedule_reboot/templates/etc/systemd/system/schedule-reboot.service.j2 b/roles/schedule_reboot/templates/etc/systemd/system/schedule-reboot.service.j2 new file mode 100644 index 000000000..2c9d8231e --- /dev/null +++ b/roles/schedule_reboot/templates/etc/systemd/system/schedule-reboot.service.j2 @@ -0,0 +1,18 @@ +# {{ ansible_managed }} +# 2026060901 + +[Unit] +Description=schedule-reboot Service +# This is the generic reboot actor. Producers that run at the same window order +# themselves Before=schedule-reboot.service (order-only, no Wants/Requires), so the +# reboot deterministically waits for their work to finish; on a window where no +# producer runs, this runs alone and reboots only if the spool has a pending request. + +[Service] +ExecStart=/usr/local/libexec/do-reboot +Type=oneshot +User=root +KillMode=process + +[Install] +WantedBy=basic.target diff --git a/roles/schedule_reboot/templates/etc/systemd/system/schedule-reboot.timer.j2 b/roles/schedule_reboot/templates/etc/systemd/system/schedule-reboot.timer.j2 new file mode 100644 index 000000000..752bda127 --- /dev/null +++ b/roles/schedule_reboot/templates/etc/systemd/system/schedule-reboot.timer.j2 @@ -0,0 +1,12 @@ +# {{ ansible_managed }} +# 2026060401 + +[Unit] +Description=schedule-reboot Timer + +[Timer] +OnCalendar={{ schedule_reboot__on_calendar }} +Unit=schedule-reboot.service + +[Install] +WantedBy=timers.target diff --git a/roles/schedule_reboot/templates/etc/tmpfiles.d/schedule-reboot.conf.j2 b/roles/schedule_reboot/templates/etc/tmpfiles.d/schedule-reboot.conf.j2 new file mode 100644 index 000000000..83865fd2a --- /dev/null +++ b/roles/schedule_reboot/templates/etc/tmpfiles.d/schedule-reboot.conf.j2 @@ -0,0 +1,4 @@ +# {{ ansible_managed }} +# 2026060401 + +d /run/schedule-reboot 0755 root root - diff --git a/roles/schedule_reboot/templates/usr/local/libexec/do-reboot.j2 b/roles/schedule_reboot/templates/usr/local/libexec/do-reboot.j2 new file mode 100644 index 000000000..06c78a22a --- /dev/null +++ b/roles/schedule_reboot/templates/usr/local/libexec/do-reboot.j2 @@ -0,0 +1,92 @@ +#!/usr/bin/env bash +# {{ ansible_managed }} +# 2026061001 + +# The single reboot actor. Run by schedule-reboot.service (a timer at the reboot +# window, ordered After= the update lanes) and by `schedule-reboot --now`. Reboots +# the host if, and only if, at least one reboot request file is present in the spool +# directory. Each request file's name is a category and its content is included in +# the notification mail. Multiple requests coalesce into a single reboot. + +export LC_ALL=C + +SPOOL_DIR='/run/schedule-reboot' +REBOOT=$(which reboot) + +send_msg () { + local subject="$1" body="$2" + # feed a complete message to sendmail (-t reads the recipients from the headers). + # sendmail has a stable CLI across MTAs, unlike mailx's -r / -a "From:" split. + { + printf 'From: %s\n' "$SENDER" + printf 'To: %s\n' "${RECIPIENTS// /, }" + printf 'Subject: %s\n' "$subject" + printf 'Content-Type: text/plain; charset="utf-8"\n' + printf 'Content-Transfer-Encoding: 8bit\n' + printf '\n%s\n' "$body" + } | sendmail -oi -t -f "$SENDER" +{% if schedule_reboot__rocketchat_url is defined and schedule_reboot__rocketchat_url | length %} + /usr/bin/curl --silent --output /dev/null --data-urlencode \ + "text=${subject} + + ${body} + {{ schedule_reboot__rocketchat_msg_suffix }}" \ + --data-urlencode "parse_mode=HTML" --data-urlencode "disable_web_page_preview=true" \ + "{{ schedule_reboot__rocketchat_url }}" +{% endif %} +} + +SUBJECT_PREFIX="{{ schedule_reboot__mail_subject_prefix }}{{ schedule_reboot__mail_subject_hostname }}" +SENDER="$SUBJECT_PREFIX <{{ schedule_reboot__mail_from }}>" +RECIPIENTS="{{ schedule_reboot__mail_recipients | join(' ') }}" + +# no reboot requested -> nothing to do (the common case on most days). +# -v tests whether the first element is set, so an empty (or absent) spool exits here. +shopt -s nullglob +requests=("$SPOOL_DIR"/*) +if [[ ! -v requests[0] ]]; then + exit 0 +fi + +# build the notification: one section per request, file name as header, content below +MSGBODY='' +for request in "${requests[@]}"; do + MSGBODY+="=== $(basename "$request") ==="$'\n' + MSGBODY+="$(cat "$request")"$'\n\n' +done +# the reboot reasons are the request file names, comma-separated for the subject +reasons=$(printf '%s, ' "${requests[@]##*/}") +send_msg "$SUBJECT_PREFIX - Automatic reboot due to ${reasons%, }" "$MSGBODY" + +{% if schedule_reboot__icinga2_api_user_login is defined and schedule_reboot__icinga2_api_user_login | length %} +# needed for Icinga to set downtime (max. 5 minutes downtime) +START_TIME=$(date +%s) +END_TIME=$(( START_TIME + 300 )) +curl --connect-timeout 5 --insecure --silent --user '{{ schedule_reboot__icinga2_api_user_login["username"] }}:{{ schedule_reboot__icinga2_api_user_login["password"] }}' --header 'Accept: application/json' --request POST '{{ schedule_reboot__icinga2_api_url }}/v1/actions/schedule-downtime' --data-binary @- 1> /dev/null << EOF +{ + "type": "Host", + "filter": "match(\"{{ schedule_reboot__icinga2_hostname }}\", host.name)", + "start_time": "$START_TIME", + "end_time": "$END_TIME", + "author": "{{ schedule_reboot__icinga2_hostname }}", + "comment": "Automatic reboot due to pending reboot requests.", + "all_services": true +} +EOF +{% endif %} + +# give the notification mail time to leave the local postfix queue before we go +# down. the queue is persistent so the mail is never lost, but this makes the +# heads-up arrive before the reboot rather than after the host is back. the host +# keeps serving during the wait; apache is only killed below, right before reboot. +sleep {{ schedule_reboot__reboot_grace_period }} + +# save time and don't wait for any apache graceful finishing +{% if ansible_facts['os_family'] == 'Debian' %} +kill -9 "$(pidof apache2)" 2> /dev/null +{% else %} +kill -9 "$(pidof httpd)" 2> /dev/null +{% endif %} + +# the spool lives on tmpfs (/run), so the reboot itself clears all request files +"$REBOOT" diff --git a/roles/schedule_reboot/templates/usr/local/sbin/schedule-reboot.j2 b/roles/schedule_reboot/templates/usr/local/sbin/schedule-reboot.j2 new file mode 100644 index 000000000..4a6be2730 --- /dev/null +++ b/roles/schedule_reboot/templates/usr/local/sbin/schedule-reboot.j2 @@ -0,0 +1,39 @@ +#!/usr/bin/env bash +# {{ ansible_managed }} +# 2026060901 + +# Request a reboot at the next maintenance window. Writes a request file into the +# spool directory that schedule-reboot.service (running at the window) acts on. The +# file name is a free-form category; the optional detail text is included in the +# notification mail. Use this after an ad-hoc change that needs a reboot. +# +# Usage: +# schedule-reboot [detail ...] request a reboot at the next window +# schedule-reboot --now [detail ...] request and reboot now + +export LC_ALL=C + +SPOOL_DIR='/run/schedule-reboot' + +NOW=0 +if [ "$1" = '--now' ]; then + NOW=1 + shift +fi + +REASON="$1" +if [ -z "$REASON" ]; then + echo 'usage: schedule-reboot [--now] [detail ...]' >&2 + exit 2 +fi +shift +DETAIL="$*" + +# the spool directory is owned and created by the schedule_reboot role (tmpfiles.d) +# keep the reason a single path component +REASON_FILE="${REASON//\//_}" +printf '%s\n' "${DETAIL:-reboot requested via schedule-reboot}" > "$SPOOL_DIR/$REASON_FILE" + +if [ "$NOW" -eq 1 ]; then + systemctl start schedule-reboot.service +fi diff --git a/roles/system_update/README.md b/roles/system_update/README.md index 75582d954..4bd0e0098 100644 --- a/roles/system_update/README.md +++ b/roles/system_update/README.md @@ -1,12 +1,8 @@ # Ansible Role linuxfabrik.lfops.system_update -This role configures the server to do (weekly) system updates by deploying two shell scripts: The first script `notify-and-schedule` checks for available updates (normally during the day), and notifies the system administrators either via email or [Rocket.Chat](https://rocket.chat/). On update time (usually the next morning at round about 4 AM), the second script `update-and-reboot` +This role configures the server to do system updates. A weekly regular lane applies all available updates, and on Rocky Linux a daily security lane applies only security hot-fixes from the dedicated `security` repository. Both run at one configurable maintenance window per host and reboot the host afterwards if an update requires it. A `notify-and-schedule` script informs the administrators (via email or [Rocket.Chat](https://rocket.chat/)) when and which updates are pending. -* sets a downtime for the host and all its services in Icinga -* applies all updates -* and, if necessary, automatically reboots the host after the updates. - -On Rocky Linux the role additionally sets up a separate security lane that installs only security hot-fixes daily, independent of the weekly update lane. +Reboots are not performed by the update scripts themselves. They are delegated to the [schedule_reboot](https://github.com/Linuxfabrik/lfops/tree/main/roles/schedule_reboot) role: an update that needs a reboot drops a request into that role's spool, and `schedule_reboot` reboots the host once, at the window. *Available since LFOps `2.0.0`.* @@ -14,18 +10,18 @@ On Rocky Linux the role additionally sets up a separate security lane that insta ## How the Role Behaves -* **Two independent lanes.** The regular lane (`notify-and-schedule` / `update-and-reboot`) applies all available updates on its weekly schedule. On Rocky Linux a second, independent security lane (`security-update`, twice a day by default) installs only security hot-fixes from the dedicated `security` repository, isolated from the regular lane via `--disablerepo` / `--enablerepo`. Both reboot the host when an update requires it. -* **The security lane is enabled by default, but a no-op without the `security` repository.** That repository is provided by the [repo_baseos](https://github.com/Linuxfabrik/lfops/tree/main/roles/repo_baseos) role. On hosts where it is not present, the security lane installs nothing and never reboots. Turn the lane off entirely with `system_update__security_enabled: false`. -* **Reboots are steered per host group.** A security hot-fix that requires a reboot is scheduled via `at`; the time comes from `system_update__security_reboot_time__*`. This can be used so test hosts reboot immediately (`'now'`) while production hosts defer to the evening (for example `'19:00'`). +* **Updates run at the reboot window.** The regular lane (weekly) and the Rocky security lane (daily) both run at the maintenance window defined by `schedule_reboot__reboot_time__*` (the [schedule_reboot](https://github.com/Linuxfabrik/lfops/tree/main/roles/schedule_reboot) role). When an update needs a reboot it drops a request into that role's spool; the update unit is ordered before the reboot actor, so the reboot waits for the update to finish before it runs. +* **The security lane is enabled by default, but a no-op without the `security` repository.** That repository is provided by the [repo_baseos](https://github.com/Linuxfabrik/lfops/tree/main/roles/repo_baseos) role. On hosts where it is not present, the security lane installs nothing and requests no reboot. Turn the lane off entirely with `system_update__security_enabled: false`. ## Dependent Roles Any [LFOps playbook](https://github.com/Linuxfabrik/lfops/blob/main/playbooks/README.md) that installs this role runs these for you. Optional ones can be disabled via the playbook's skip variables. -* at must be installed (role: [linuxfabrik.lfops.at](https://github.com/Linuxfabrik/lfops/tree/main/roles/at)). -* mailx must be installed (role: [linuxfabrik.lfops.mailx](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailx)). -* yum-utils must be installed on RHEL (role: [linuxfabrik.lfops.yum_utils](https://github.com/Linuxfabrik/lfops/tree/main/roles/yum_utils)). +* Optional: the root mail aliases are configured (role: [linuxfabrik.lfops.mailto_root](https://github.com/Linuxfabrik/lfops/tree/main/roles/mailto_root)). +* Optional: postfix provides the `sendmail` interface and mail relay used for the notifications (role: [linuxfabrik.lfops.postfix](https://github.com/Linuxfabrik/lfops/tree/main/roles/postfix)). +* The reboot mechanism must be present (role: [linuxfabrik.lfops.schedule_reboot](https://github.com/Linuxfabrik/lfops/tree/main/roles/schedule_reboot)). It owns the reboot window (`schedule_reboot__reboot_time__*`) and performs the reboot an update requests. +* Optional: yum-utils is installed on RHEL (role: [linuxfabrik.lfops.yum_utils](https://github.com/Linuxfabrik/lfops/tree/main/roles/yum_utils)). ## Requirements @@ -39,12 +35,12 @@ Manual steps: `system_update` -* Sets up automatic system update via systemd timer, and on Rocky Linux hosts the optional security-update timer. +* Deploys the notify-and-schedule and update-and-reboot scripts and their systemd timers/services. On Rocky Linux hosts it also deploys the security-update lane. * Triggers: none. `system_update:state` -* Determines whether notify-and-schedule.timer and security-update.timer are enabled. +* Determines whether the notify-and-schedule, update-and-reboot and (Rocky) security-update timers are enabled. * Triggers: none. @@ -56,24 +52,6 @@ Manual steps: * Type: Bool. * Default: `false` -`system_update__icinga2_api_url` - -* The URL of the Icinga2 API (usually on the Icinga2 Master). This will be used to set a downtime for the corresponding host and all its services in the `reboot` alias. -* Type: String. -* Default: `'https://{{ icinga2_agent__icinga2_master_host | d("") }}:{{ icinga2_agent__icinga2_master_port | d(5665) }}'` - -`system_update__icinga2_api_user_login` - -* The Icinga2 API User to set the downtime for the corresponding host and all its services. -* Type: Dictionary. -* Default: unset - -`system_update__icinga2_hostname` - -* The hostname of the Icinga2 host on which the downtime should be set. -* Type: String. -* Default: `'{{ ansible_facts["nodename"] }}'` - `system_update__mail_from` * The email sender account. This will be used as the "from"-address for all notifications. @@ -106,9 +84,9 @@ Manual steps: `system_update__notify_and_schedule_on_calendar` -* When the notification for the expected updates should be sent. Have a look at [systemd.time(7)](https://www.freedesktop.org/software/systemd/man/systemd.time.html) for the format. +* When the informational "updates pending" notification is sent. This is purely a heads-up; the update is applied later at the maintenance window. By default it is sent at 10:00 on the day before `system_update__update_day`, so it always arrives before the update runs. If you set a multi-day or date-based `system_update__update_day`, set this explicitly. Have a look at [systemd.time(7)](https://www.freedesktop.org/software/systemd/man/systemd.time.html) for the format. * Type: String. -* Default: `'mon 10:00'` +* Default: 10:00 on the day before `system_update__update_day` (for example `'Mon 10:00'` when the update day is `Tue`) `system_update__post_update_code` @@ -142,15 +120,9 @@ Manual steps: `system_update__security_on_calendar` -* When the security lane checks for and installs security hot-fixes. Have a look at [systemd.time(7)](https://www.freedesktop.org/software/systemd/man/systemd.time.html) for the format. +* When the security lane checks for and installs security hot-fixes. Defaults to the reboot window (`schedule_reboot__reboot_time__*`) so the reboot follows right after. Have a look at [systemd.time(7)](https://www.freedesktop.org/software/systemd/man/systemd.time.html) for the format. * Type: String. -* Default: `'*-*-* 10,16:00'` - -`system_update__security_reboot_time__host_var` / `system_update__security_reboot_time__group_var` - -* When to reboot after a security hot-fix that requires it. Passed verbatim to `at`. Use this to steer test versus production hosts via inventory group membership: `'now'` reboots immediately, a time such as `'19:00'` defers the reboot. -* Type: String. -* Default: `'now'` +* Default: `'*-*-* {{ schedule_reboot__reboot_time__combined_var | d("04:00") }}'` `system_update__security_repos` @@ -158,27 +130,22 @@ Manual steps: * Type: List. * Default: `['security']` +`system_update__update_day` + +* The weekday on which the regular (weekly) update lane runs. Combined with the maintenance window to build the timer schedule, for example `Tue 04:00`. Defaults to `Tue` so the Monday notification (`system_update__notify_and_schedule_on_calendar`) arrives before the update. Have a look at [systemd.time(7)](https://www.freedesktop.org/software/systemd/man/systemd.time.html) for the format. +* Type: String. +* Default: `'Tue'` + `system_update__update_enabled` -* Enables or disables the system-update timer, analogous to `systemctl enable/disable --now`. +* Enables or disables the regular update lane (the notify-and-schedule and update-and-reboot timers), analogous to `systemctl enable/disable --now`. * Type: Bool. * Default: `true` -`system_update__update_time` - -* The time when to actually execute the updates (and automatically reboot if necessary), relative to `system_update__notify_and_schedule_on_calendar`. Passed verbatim to `at`. The default schedules the update for the next day between 04:00 and 04:59, with the exact minute derived deterministically from `inventory_hostname` so multiple hosts spread across the hour instead of all updating at 04:00. -* Type: String. -* Default: `'04:{{ 59 | random(seed=inventory_hostname) }} + 1 days'` - Example: ```yaml # optional system_update__cache_only: true -system_update__icinga2_api_url: 'https://icinga.example.com:5665' -system_update__icinga2_api_user_login: - username: 'downtime-user' - password: 'linuxfabrik' -system_update__icinga2_hostname: 'myhost.example.com' system_update__mail_from: 'noreply@example.com' system_update__mail_recipients_new_configfiles: - 'info@example.com' @@ -207,12 +174,10 @@ system_update__pre_update_code: |- system_update__rocketchat_msg_suffix: '@administrator' system_update__rocketchat_url: 'https://chat.example.com/hooks/abcd1234' system_update__security_enabled: true -system_update__security_on_calendar: '*-*-* 10,16:00' -system_update__security_reboot_time__group_var: '19:00' system_update__security_repos: - 'security' +system_update__update_day: 'Tue' system_update__update_enabled: true -system_update__update_time: '04:{{ 59 | random(seed=inventory_hostname) }} + 1 days' ``` diff --git a/roles/system_update/defaults/main.yml b/roles/system_update/defaults/main.yml index 33de96709..6b59e2da8 100644 --- a/roles/system_update/defaults/main.yml +++ b/roles/system_update/defaults/main.yml @@ -1,26 +1,22 @@ system_update__cache_only: false -system_update__icinga2_api_url: 'https://{{ icinga2_agent__icinga2_master_host | d("") }}:{{ icinga2_agent__icinga2_master_port | d(5665) }}' -system_update__icinga2_hostname: '{{ ansible_facts["nodename"] }}' system_update__mail_from: '{{ mailto_root__from }}' system_update__mail_recipients_new_configfiles: '{{ mailto_root__to }}' system_update__mail_recipients_updates: '{{ mailto_root__to }}' system_update__mail_subject_hostname: '$(hostname --short)' system_update__mail_subject_prefix: '' -system_update__notify_and_schedule_on_calendar: 'mon 10:00' +# default: the morning before the regular update day, so the heads-up arrives +# before the window. derived as the weekday preceding system_update__update_day; +# for a multi-day or date-based update_day, set this explicitly. +system_update__notify_and_schedule_on_calendar: '{{ + {"Mon": "Sun", "Tue": "Mon", "Wed": "Tue", "Thu": "Wed", "Fri": "Thu", "Sat": "Fri", "Sun": "Sat"}.get(system_update__update_day[:3] | capitalize, "Mon") + }} 10:00' system_update__rocketchat_msg_suffix: '' system_update__security_enabled: true -system_update__security_on_calendar: '*-*-* 10,16:00' -system_update__security_reboot_time__dependent_var: '' -system_update__security_reboot_time__group_var: '' -system_update__security_reboot_time__host_var: '' -system_update__security_reboot_time__role_var: 'now' -system_update__security_reboot_time__combined_var: '{{ - system_update__security_reboot_time__host_var if (system_update__security_reboot_time__host_var | string | length) else - system_update__security_reboot_time__group_var if (system_update__security_reboot_time__group_var | string | length) else - system_update__security_reboot_time__dependent_var if (system_update__security_reboot_time__dependent_var | string | length) else - system_update__security_reboot_time__role_var - }}' +# the reboot window is owned by the schedule_reboot role; d() fallback keeps this +# role usable if it is ever run without schedule_reboot in the play. +system_update__security_on_calendar: '*-*-* {{ schedule_reboot__reboot_time__combined_var | d("04:00") }}' system_update__security_repos: - 'security' +system_update__update_day: 'Tue' system_update__update_enabled: true -system_update__update_time: '04:{{ 59 | random(seed=inventory_hostname) }} + 1 days' +system_update__update_on_calendar: '{{ system_update__update_day }} {{ schedule_reboot__reboot_time__combined_var | d("04:00") }}' diff --git a/roles/system_update/tasks/main.yml b/roles/system_update/tasks/main.yml index e32612344..77d286a88 100644 --- a/roles/system_update/tasks/main.yml +++ b/roles/system_update/tasks/main.yml @@ -18,26 +18,42 @@ - 'always' +# Regular lane, deployed on all hosts: +# - notify-and-schedule: informational "updates pending" mail +# - update-and-reboot: applies all updates at the window, drops a reboot request if +# needed. The reboot itself is handled by the schedule_reboot role. - block: - - name: 'Deploy /usr/local/bin/notify-and-schedule' + - name: 'Deploy /usr/local/libexec/notify-and-schedule' ansible.builtin.template: backup: true - src: 'usr/local/bin/notify-and-schedule.j2' - dest: '/usr/local/bin/notify-and-schedule' + src: 'usr/local/libexec/notify-and-schedule.j2' + dest: '/usr/local/libexec/notify-and-schedule' owner: 'root' group: 'root' mode: 0o744 - - name: 'Deploy /usr/local/bin/update-and-reboot' + - name: 'Deploy /usr/local/sbin/update-and-reboot' ansible.builtin.template: backup: true - src: 'usr/local/bin/update-and-reboot.j2' - dest: '/usr/local/bin/update-and-reboot' + src: 'usr/local/sbin/update-and-reboot.j2' + dest: '/usr/local/sbin/update-and-reboot' owner: 'root' group: 'root' mode: 0o744 + # the scripts used to live in /usr/local/bin; remove the stale copies after relocating + # them to FHS-correct /usr/local/sbin (admin commands) and /usr/local/libexec (helpers). + - name: 'rm -f the pre-FHS /usr/local/bin script locations' + ansible.builtin.file: + path: '{{ item }}' + state: 'absent' + loop: + - '/usr/local/bin/notify-and-schedule' + - '/usr/local/bin/security-reboot' + - '/usr/local/bin/security-update' + - '/usr/local/bin/update-and-reboot' + - name: 'Deploy /etc/systemd/system/notify-and-schedule.service' ansible.builtin.template: backup: true @@ -46,6 +62,7 @@ owner: 'root' group: 'root' mode: 0o644 + register: '__system_update__notify_service_result' - name: 'Deploy /etc/systemd/system/notify-and-schedule.timer' ansible.builtin.template: @@ -55,44 +72,41 @@ owner: 'root' group: 'root' mode: 0o644 + register: '__system_update__notify_timer_result' - tags: - - 'system_update' - - -- block: + - name: 'Deploy /etc/systemd/system/update-and-reboot.service' + ansible.builtin.template: + backup: true + src: 'etc/systemd/system/update-and-reboot.service.j2' + dest: '/etc/systemd/system/update-and-reboot.service' + owner: 'root' + group: 'root' + mode: 0o644 + register: '__system_update__update_service_result' - - name: 'systemctl {{ system_update__update_enabled | bool | ternary("enable", "disable") }} notify-and-schedule.timer --now' - ansible.builtin.systemd: - name: 'notify-and-schedule.timer' - state: '{{ system_update__update_enabled | bool | ternary("started", "stopped") }}' - enabled: '{{ system_update__update_enabled }}' - daemon_reload: true + - name: 'Deploy /etc/systemd/system/update-and-reboot.timer' + ansible.builtin.template: + backup: true + src: 'etc/systemd/system/update-and-reboot.timer.j2' + dest: '/etc/systemd/system/update-and-reboot.timer' + owner: 'root' + group: 'root' + mode: 0o644 + register: '__system_update__update_timer_result' tags: - 'system_update' - - 'system_update:state' -# The security lane applies Rocky Linux security hot-fixes from the dedicated -# `security` repository (provided by the repo_baseos role) twice a day, -# independent of the regular (weekly) update lane. Rocky Linux only. +# Security lane (Rocky Linux only): applies security hot-fixes from the dedicated +# `security` repository at the window and drops a reboot request into the shared spool. - block: - - name: 'Deploy /usr/local/bin/security-update' - ansible.builtin.template: - backup: true - src: 'usr/local/bin/security-update.j2' - dest: '/usr/local/bin/security-update' - owner: 'root' - group: 'root' - mode: 0o744 - - - name: 'Deploy /usr/local/bin/security-reboot' + - name: 'Deploy /usr/local/sbin/security-update' ansible.builtin.template: backup: true - src: 'usr/local/bin/security-reboot.j2' - dest: '/usr/local/bin/security-reboot' + src: 'usr/local/sbin/security-update.j2' + dest: '/usr/local/sbin/security-update' owner: 'root' group: 'root' mode: 0o744 @@ -105,6 +119,7 @@ owner: 'root' group: 'root' mode: 0o644 + register: '__system_update__security_service_result' - name: 'Deploy /etc/systemd/system/security-update.timer' ansible.builtin.template: @@ -114,6 +129,7 @@ owner: 'root' group: 'root' mode: 0o644 + register: '__system_update__security_timer_result' when: - 'ansible_facts["distribution"] == "Rocky"' @@ -122,6 +138,40 @@ - 'system_update' +- block: + + # run daemon-reload as a regular task (not a handler) before the state block, so the + # enable/start tasks act on the freshly deployed units. gated on any unit-file change + # (service or timer, regular or security) to stay idempotent. skipped Rocky tasks + # register a non-changed result, so referencing them on non-Rocky hosts is safe. + - name: 'systemctl daemon-reload' + ansible.builtin.systemd: + daemon_reload: true + when: + - '__system_update__notify_service_result is changed or + __system_update__notify_timer_result is changed or + __system_update__update_service_result is changed or + __system_update__update_timer_result is changed or + __system_update__security_service_result is changed or + __system_update__security_timer_result is changed' + + - name: 'systemctl {{ system_update__update_enabled | bool | ternary("enable", "disable") }} notify-and-schedule.timer --now' + ansible.builtin.systemd: + name: 'notify-and-schedule.timer' + state: '{{ system_update__update_enabled | bool | ternary("started", "stopped") }}' + enabled: '{{ system_update__update_enabled }}' + + - name: 'systemctl {{ system_update__update_enabled | bool | ternary("enable", "disable") }} update-and-reboot.timer --now' + ansible.builtin.systemd: + name: 'update-and-reboot.timer' + state: '{{ system_update__update_enabled | bool | ternary("started", "stopped") }}' + enabled: '{{ system_update__update_enabled }}' + + tags: + - 'system_update' + - 'system_update:state' + + - block: - name: 'systemctl {{ system_update__security_enabled | bool | ternary("enable", "disable") }} security-update.timer --now' @@ -129,7 +179,6 @@ name: 'security-update.timer' state: '{{ system_update__security_enabled | bool | ternary("started", "stopped") }}' enabled: '{{ system_update__security_enabled }}' - daemon_reload: true when: - 'ansible_facts["distribution"] == "Rocky"' diff --git a/roles/system_update/templates/etc/systemd/system/notify-and-schedule.service.j2 b/roles/system_update/templates/etc/systemd/system/notify-and-schedule.service.j2 index 688de911d..b1fc9ed9e 100644 --- a/roles/system_update/templates/etc/systemd/system/notify-and-schedule.service.j2 +++ b/roles/system_update/templates/etc/systemd/system/notify-and-schedule.service.j2 @@ -1,11 +1,11 @@ # {{ ansible_managed }} -# 2021081601 +# 2026060901 [Unit] Description=notify-and-schedule Service [Service] -ExecStart=/usr/local/bin/notify-and-schedule +ExecStart=/usr/local/libexec/notify-and-schedule Type=oneshot User=root KillMode=process diff --git a/roles/system_update/templates/etc/systemd/system/security-update.service.j2 b/roles/system_update/templates/etc/systemd/system/security-update.service.j2 index 65d0199e9..f4c634b5e 100644 --- a/roles/system_update/templates/etc/systemd/system/security-update.service.j2 +++ b/roles/system_update/templates/etc/systemd/system/security-update.service.j2 @@ -1,11 +1,23 @@ # {{ ansible_managed }} -# 2026052401 +# 2026061001 [Unit] Description=security-update Service +# order-only edges (no Wants/Requires); the Rocky-only security lane is the outlier, +# so it owns the ordering relative to the regular lane. +# After=update-and-reboot.service: on a day where both fire, let the weekly lane +# apply the frozen mirror snapshot first, then top up only the delta from the live +# security repo here (if the snapshot already covers it, this lane's check-update +# finds nothing and exits). It also serializes the two dnf runs so they never hit +# the rpmdb lock at once. +# Before=schedule-reboot.service: finish before the reboot actor so a requested +# reboot waits for this transaction. +# After=/Before= an unqueued unit is a no-op, so non-shared days are unaffected. +After=update-and-reboot.service +Before=schedule-reboot.service [Service] -ExecStart=/usr/local/bin/security-update +ExecStart=/usr/local/sbin/security-update Type=oneshot User=root KillMode=process diff --git a/roles/system_update/templates/etc/systemd/system/update-and-reboot.service.j2 b/roles/system_update/templates/etc/systemd/system/update-and-reboot.service.j2 new file mode 100644 index 000000000..d7ae828e7 --- /dev/null +++ b/roles/system_update/templates/etc/systemd/system/update-and-reboot.service.j2 @@ -0,0 +1,19 @@ +# {{ ansible_managed }} +# 2026061001 + +[Unit] +Description=update-and-reboot Service +# order-only edge (no Wants/Requires): run before the reboot actor so a requested +# reboot waits for this update to finish. Before= an unqueued unit is a no-op. The +# security lane orders itself After= this service (see security-update.service for +# why the weekly lane runs first). +Before=schedule-reboot.service + +[Service] +ExecStart=/usr/local/sbin/update-and-reboot +Type=oneshot +User=root +KillMode=process + +[Install] +WantedBy=basic.target diff --git a/roles/system_update/templates/etc/systemd/system/update-and-reboot.timer.j2 b/roles/system_update/templates/etc/systemd/system/update-and-reboot.timer.j2 new file mode 100644 index 000000000..dec84119f --- /dev/null +++ b/roles/system_update/templates/etc/systemd/system/update-and-reboot.timer.j2 @@ -0,0 +1,12 @@ +# {{ ansible_managed }} +# 2026060401 + +[Unit] +Description=update-and-reboot Timer + +[Timer] +OnCalendar={{ system_update__update_on_calendar }} +Unit=update-and-reboot.service + +[Install] +WantedBy=timers.target diff --git a/roles/system_update/templates/usr/local/bin/notify-and-schedule.j2 b/roles/system_update/templates/usr/local/bin/notify-and-schedule.j2 deleted file mode 100644 index b1168143d..000000000 --- a/roles/system_update/templates/usr/local/bin/notify-and-schedule.j2 +++ /dev/null @@ -1,73 +0,0 @@ -#!/usr/bin/env bash -# {{ ansible_managed }} -# 2023112801 - -export LC_ALL=C - -send_msg () { -{% if ansible_facts['distribution'] == 'Debian' %} -{# note: do not quote the $RECIPIENTS, else they are passed as a single argument #} - echo -n "$MSGBODY" | mail -s "$SUBJECT" -a "From: $SENDER" $RECIPIENTS -{% else %} -{# note: do not quote the $RECIPIENTS, else they are passed as a single argument #} - echo -n "$MSGBODY" | mail -s "$SUBJECT" -r "$SENDER" $RECIPIENTS -{% endif %} -{% if system_update__rocketchat_url is defined and system_update__rocketchat_url | length %} - /usr/bin/curl --silent --output /dev/null --data-urlencode \ - "text=${SUBJECT} - - ${MSGBODY} - {{ system_update__rocketchat_msg_suffix }}" \ - --data-urlencode "parse_mode=HTML" --data-urlencode "disable_web_page_preview=true" \ - "{{ system_update__rocketchat_url }}" -{% endif %} -} - -SUBJECT_PREFIX="{{ system_update__mail_subject_prefix }}{{ system_update__mail_subject_hostname }}" -SUBJECT="$SUBJECT_PREFIX - Update at {{ system_update__update_time }}" -SENDER="$SUBJECT_PREFIX <{{ system_update__mail_from }}>" -RECIPIENTS="{{ system_update__mail_recipients_updates | join(' ') }}" - -cat > /tmp/check-update << 'EOF' -Please reply to this mail if you have any concerns. -A reboot might be necessary and will be done -automatically after updating your components. ---------------------------------------------------- - -EOF - -{% if ansible_facts['os_family'] == 'RedHat' %} -{% if not system_update__cache_only | bool %} -yum clean all 1> /dev/null 2>&1 -{% endif %} -yum check-update {{ system_update__cache_only | bool | ternary("--cacheonly", "") }} >> /tmp/check-update 2> /tmp/system-update-check-stderr -retc=$? - -if [ "$retc" -eq 100 ]; then - # updates are available, so inform and schedule - MSGBODY=$(cat /tmp/check-update) - send_msg - echo "/usr/local/bin/update-and-reboot" | at {{ system_update__update_time }} 2> /dev/null -elif [ "$retc" -eq 1 ]; then - SUBJECT="$SUBJECT_PREFIX - System update check failed" - MSGBODY=$( /dev/null 2>&1 -apt update 1> /dev/null 2>&1 -{% endif %} -apt list --upgradable >> /tmp/check-update - -if [ $(grep upgradable /tmp/check-update | wc -l) -gt 0 ]; then - # updates are available, so inform and schedule - MSGBODY=$(cat /tmp/check-update) - send_msg - echo "/usr/local/bin/update-and-reboot" | at {{ system_update__update_time }} 2> /dev/null -fi -{% endif %} diff --git a/roles/system_update/templates/usr/local/bin/security-reboot.j2 b/roles/system_update/templates/usr/local/bin/security-reboot.j2 deleted file mode 100644 index e5124133a..000000000 --- a/roles/system_update/templates/usr/local/bin/security-reboot.j2 +++ /dev/null @@ -1,29 +0,0 @@ -#!/usr/bin/env bash -# {{ ansible_managed }} -# 2026052401 - -# Set an Icinga downtime and reboot. Invoked via `at` by /usr/local/bin/security-update -# so that the downtime is set at reboot time, not when the update was applied. - -export LC_ALL=C - -REBOOT=$(which reboot) - -{% if system_update__icinga2_api_user_login is defined and system_update__icinga2_api_user_login | length %} -# needed for Icinga to set downtime (max. 5 minutes downtime) -START_TIME=$(date +%s) -END_TIME=$(( START_TIME + 300 )) -curl --connect-timeout 5 --insecure --silent --user '{{ system_update__icinga2_api_user_login["username"] }}:{{ system_update__icinga2_api_user_login["password"] }}' --header 'Accept: application/json' --request POST '{{ system_update__icinga2_api_url }}/v1/actions/schedule-downtime' --data-binary @- 1> /dev/null << EOF -{ - "type": "Host", - "filter": "match(\"{{ system_update__icinga2_hostname }}\", host.name)", - "start_time": "$START_TIME", - "end_time": "$END_TIME", - "author": "{{ system_update__icinga2_hostname }}", - "comment": "Automatic reboot due to security updates.", - "all_services": true -} -EOF -{% endif %} - -"$REBOOT" diff --git a/roles/system_update/templates/usr/local/bin/security-update.j2 b/roles/system_update/templates/usr/local/bin/security-update.j2 deleted file mode 100644 index 845945e3f..000000000 --- a/roles/system_update/templates/usr/local/bin/security-update.j2 +++ /dev/null @@ -1,79 +0,0 @@ -#!/usr/bin/env bash -# {{ ansible_managed }} -# 2026052401 - -# Apply Rocky Linux security hot-fixes from the dedicated `security` repository -# only, then reboot if needed. Runs on a timer (default twice daily). The reboot -# is scheduled via `at`; the time comes from the host's group/inventory -# (e.g. 'now' on test hosts, '19:00' on production hosts). - -export LC_ALL=C - -# dedicated at queue, so the 10:00 and 16:00 runs don't stack multiple reboots -AT_QUEUE='s' -REPOS="{{ system_update__security_repos | join(',') }}" - -send_msg () { -{# note: do not quote the $RECIPIENTS, else they are passed as a single argument #} - echo -n "$MSGBODY" | mail -s "$SUBJECT" -r "$SENDER" $RECIPIENTS -{% if system_update__rocketchat_url is defined and system_update__rocketchat_url | length %} - /usr/bin/curl --silent --output /dev/null --data-urlencode \ - "text=${SUBJECT} - - ${MSGBODY} - {{ system_update__rocketchat_msg_suffix }}" \ - --data-urlencode "parse_mode=HTML" --data-urlencode "disable_web_page_preview=true" \ - "{{ system_update__rocketchat_url }}" -{% endif %} -} - -SUBJECT_PREFIX="{{ system_update__mail_subject_prefix }}{{ system_update__mail_subject_hostname }}" -SENDER="$SUBJECT_PREFIX <{{ system_update__mail_from }}>" -RECIPIENTS="{{ system_update__mail_recipients_updates | join(' ') }}" - -# the security lane only acts when at least one of the configured security -# repositories is enabled on this host (provided by the repo_baseos role). skip -# silently otherwise, so the timer can default to enabled without acting on hosts -# where the repo is absent or was opted out via repo_baseos__security_repo_enabled. -have_repo=0 -for repo in {{ system_update__security_repos | join(' ') }}; do - if dnf repolist --enabled 2> /dev/null | awk 'NR > 1 { print $1 }' | grep -qxF "$repo"; then - have_repo=1 - fi -done -if [ "$have_repo" -eq 0 ]; then - exit 0 -fi - -# install security hot-fixes from the security repo only. --disablerepo / --enablerepo -# keeps this lane separate from the regular (weekly) update lane and from the frozen -# mirror snapshot. -yum --disablerepo="*" --enablerepo="$REPOS" -y upgrade 1> /tmp/security-update 2> /tmp/security-update-stderr -if [ $? -ne 0 ]; then - SUBJECT="$SUBJECT_PREFIX - Security update failed" - MSGBODY=$( /dev/null -reboothint_rc=$? -needs-restarting 2> /dev/null > /tmp/security-needs-restarting - -if [ "$reboothint_rc" -eq 1 ] || [ -s /tmp/security-needs-restarting ]; then - # drop a reboot scheduled by an earlier run today before scheduling a new one - for job in $(atq -q "$AT_QUEUE" | awk '{print $1}'); do - atrm "$job" - done - echo "/usr/local/bin/security-reboot" | at -q "$AT_QUEUE" {{ system_update__security_reboot_time__combined_var }} 2> /dev/null - SUBJECT="$SUBJECT_PREFIX - Security update installed, reboot scheduled at {{ system_update__security_reboot_time__combined_var }}" - MSGBODY=$(yum history info) - send_msg - exit 0 -fi - -SUBJECT="$SUBJECT_PREFIX - Security update installed without reboot" -MSGBODY=$(cat /tmp/security-update) -send_msg diff --git a/roles/system_update/templates/usr/local/bin/update-and-reboot.j2 b/roles/system_update/templates/usr/local/bin/update-and-reboot.j2 deleted file mode 100644 index 9d5b2da95..000000000 --- a/roles/system_update/templates/usr/local/bin/update-and-reboot.j2 +++ /dev/null @@ -1,169 +0,0 @@ -#!/usr/bin/env bash -# {{ ansible_managed }} -# 2023121201 - -export LC_ALL=C - -PIDOF=$(which pidof) -REBOOT=$(which reboot) - -schedule_downtime_and_reboot () { - {% if system_update__icinga2_api_user_login is defined and system_update__icinga2_api_user_login | length %} - # needed for Icinga to set downtime (max. 5 minutes downtime) - START_TIME=$(date +%s) - END_TIME=$(( START_TIME + 300 )) - curl --connect-timeout 5 --insecure --silent --user '{{ system_update__icinga2_api_user_login["username"] }}:{{ system_update__icinga2_api_user_login["password"] }}' --header 'Accept: application/json' --request POST '{{ system_update__icinga2_api_url }}/v1/actions/schedule-downtime' --data-binary @- 1> /dev/null << EOF - { - "type": "Host", - "filter": "match(\"{{ system_update__icinga2_hostname }}\", host.name)", - "start_time": "$START_TIME", - "end_time": "$END_TIME", - "author": "{{ system_update__icinga2_hostname }}", - "comment": "Automatic reboot due to updates.", - "all_services": true - } -EOF - {% endif %} - - echo "$REBOOT" | at now +1 minutes 2> /dev/null - exit 0 -} - -send_msg () { -{% if ansible_facts['distribution'] == 'Debian' %} -{# note: do not quote the $RECIPIENTS, else they are passed as a single argument #} - echo -n "$MSGBODY" | mail -s "$SUBJECT" -a "From: $SENDER" $RECIPIENTS -{% else %} -{# note: do not quote the $RECIPIENTS, else they are passed as a single argument #} - echo -n "$MSGBODY" | mail -s "$SUBJECT" -r "$SENDER" $RECIPIENTS -{% endif %} -{% if system_update__rocketchat_url is defined and system_update__rocketchat_url | length %} - /usr/bin/curl --silent --output /dev/null --data-urlencode \ - "text=${SUBJECT} - - ${MSGBODY} - {{ system_update__rocketchat_msg_suffix }}" \ - --data-urlencode "parse_mode=HTML" --data-urlencode "disable_web_page_preview=true" \ - "{{ system_update__rocketchat_url }}" -{% endif %} -} - -{% if system_update__post_update_code is defined and system_update__post_update_code | length %} -post_update_code () { -{{ system_update__post_update_code | indent(4, first=True) }} -} -{% endif %} - -SUBJECT_PREFIX="{{ system_update__mail_subject_prefix }}{{ system_update__mail_subject_hostname }}" -SENDER="$SUBJECT_PREFIX <{{ system_update__mail_from }}>" -RECIPIENTS="{{ system_update__mail_recipients_updates | join(' ' ) }}" - -{% if system_update__pre_update_code is defined and system_update__pre_update_code | length %} -# start raw system_update__pre_update_code -{{ system_update__pre_update_code }} -# end raw system_update__pre_update_code -{% endif %} - -if systemctl is-active --quiet aide-check.timer && systemctl is-failed --quiet aide-check.service; then - cp /var/log/aide/aide.log /var/log/aide/aide.log-pre-system-update - SUBJECT="$SUBJECT_PREFIX - aide-check.service state was failed before System Update" - MSGBODY="Please check the logfile at /var/log/aide/aide.log-pre-system-update (saved before the update ran)." - send_msg -fi - -# do the update, and print only critical errors about which we must be told -{% if ansible_facts['os_family'] == 'RedHat' %} -yum -y update {{ system_update__cache_only | bool | ternary("--cacheonly --setopt=metadata_timer_sync=0", "") }} 1> /dev/null 2> /tmp/system-update-stderr -if [ $? -ne 0 ]; then - SUBJECT="$SUBJECT_PREFIX - System update failed" - MSGBODY=$( /dev/null - \mv /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz - systemctl restart aide-check.service -fi - -{% if ansible_facts['os_family'] == 'Debian' %} -apt-get update {{ system_update__cache_only | bool | ternary("--no-download", "") }} 1> /dev/null -export DEBIAN_FRONTEND=noninteractive -yes '' | apt-get -y -o DPkg::options::="--force-confdef" -o DPkg::options::="--force-confold" --with-new-pkgs upgrade > /tmp/update_output -{% endif %} - -{% if ansible_facts['os_family'] == 'RedHat' %} -# after any update, notify root user about new rpmsave or rpmnew files -# ignore messages like 'find: ‘/proc/26144’: No such file or directory' -find / -mount -name '*.rpmnew' -exec /root/send-mail "{{ system_update__mail_subject_prefix }}$(hostname --short) - rpmnew File found" "{}" "{{ system_update__mail_recipients_new_configfiles | join(' ' ) }}" \; 2> /dev/null -find / -mount -name '*.rpmsave' -exec /root/send-mail "{{ system_update__mail_subject_prefix }}$(hostname --short) - rpmsave File found" "{}" "{{ system_update__mail_recipients_new_configfiles | join(' ' ) }}" \; 2> /dev/null -{% endif %} - -{% if ansible_facts['os_family'] == 'Debian' %} -# after any update, notify root user about new rpmsave or rpmnew files -# ignore messages like 'find: ‘/proc/26144’: No such file or directory' -find / -mount -name '*.dpkg-dist' -exec /root/send-mail "{{ system_update__mail_subject_prefix }}$(hostname --short) - dpkg-dist File found" "{}" "{{ system_update__mail_recipients_new_configfiles | join(' ' ) }}" \; 2> /dev/null -find / -mount -name '*.ucf-dist' -exec /root/send-mail "{{ system_update__mail_subject_prefix }}$(hostname --short) - ucf-dist File found" "{}" "{{ system_update__mail_recipients_new_configfiles | join(' ' )}}" \; 2> /dev/null -{% endif %} - -# any restarts needed? -{% if ansible_facts['os_family'] == 'RedHat' %} -needs-restarting --reboothint &> /dev/null -if [ $? -eq 1 ]; then - # save time and don't wait for any apache graceful finishing - kill -9 "$($PIDOF httpd)" 2> /dev/null - SUBJECT="$SUBJECT_PREFIX - Reboot due to Kernel Updates" - MSGBODY=$(yum history info) -{% if system_update__post_update_code is defined and system_update__post_update_code | length %} - post_update_code -{% endif %} - send_msg - schedule_downtime_and_reboot -fi - -needs-restarting 2> /dev/null > /tmp/needs-restarting -if [ -s /tmp/needs-restarting ]; then - # save time and don't wait for any graceful finishing apache - kill -9 "$($PIDOF httpd)" 2> /dev/null - SUBJECT="$SUBJECT_PREFIX - Reboot due to Service Updates" - MSGBODY=$(cat /tmp/needs-restarting; yum history info) -{% if system_update__post_update_code is defined and system_update__post_update_code | length %} - post_update_code -{% endif %} - send_msg - schedule_downtime_and_reboot -fi -{% endif %} - -{% if ansible_facts['os_family'] == 'Debian' %} -# we call needrestart with the -p option so it behaves like a nagios plugin. any exit code != 0 means a reboot is required -if [ -f /var/run/reboot-required ] || ! needrestart -p 1> /dev/null; then - # save time and don't wait for any apache graceful finishing - kill -9 "$($PIDOF apache2)" 2> /dev/null - SUBJECT="$SUBJECT_PREFIX - Reboot due to Updates" - MSGBODY=$(cat /tmp/update_output) -{% if system_update__post_update_code is defined and system_update__post_update_code | length %} - post_update_code -{% endif %} - send_msg - schedule_downtime_and_reboot -fi -{% endif %} - -{% if system_update__post_update_code is defined and system_update__post_update_code | length %} -post_update_code -{% endif %} - -{% if ansible_facts['os_family'] == 'RedHat' %} -SUBJECT="$SUBJECT_PREFIX - System updated without Reboot" -MSGBODY=$(yum history info) -send_msg -{% endif %} - -{% if ansible_facts['os_family'] == 'Debian' %} -SUBJECT="$SUBJECT_PREFIX - System updated without Reboot" -MSGBODY=$(cat /tmp/update_output) -send_msg -{% endif %} diff --git a/roles/system_update/templates/usr/local/libexec/notify-and-schedule.j2 b/roles/system_update/templates/usr/local/libexec/notify-and-schedule.j2 new file mode 100644 index 000000000..6b35bd189 --- /dev/null +++ b/roles/system_update/templates/usr/local/libexec/notify-and-schedule.j2 @@ -0,0 +1,98 @@ +#!/usr/bin/env bash +# {{ ansible_managed }} +# 2026061001 + +# Check for available updates and notify the administrators. This is informational +# only: the update is applied later by update-and-reboot.service at the maintenance +# window, and a reboot, if needed, follows via schedule-reboot.service. + +export LC_ALL=C + +# log progress to stdout so systemd captures it in the journal (journalctl -u +# notify-and-schedule.service); journald already prefixes a timestamp and the unit name. +log () { + printf '%s\n' "$*" +} + +send_msg () { + local subject="$1" body="$2" + # feed a complete message to sendmail (-t reads the recipients from the headers). + # sendmail has a stable CLI across MTAs, unlike mailx's -r / -a "From:" split. + { + printf 'From: %s\n' "$SENDER" + printf 'To: %s\n' "${RECIPIENTS// /, }" + printf 'Subject: %s\n' "$subject" + printf 'Content-Type: text/plain; charset="utf-8"\n' + printf 'Content-Transfer-Encoding: 8bit\n' + printf '\n%s\n' "$body" + } | sendmail -oi -t -f "$SENDER" +{% if system_update__rocketchat_url is defined and system_update__rocketchat_url | length %} + /usr/bin/curl --silent --output /dev/null --data-urlencode \ + "text=${subject} + + ${body} + {{ system_update__rocketchat_msg_suffix }}" \ + --data-urlencode "parse_mode=HTML" --data-urlencode "disable_web_page_preview=true" \ + "{{ system_update__rocketchat_url }}" +{% endif %} +} + +SUBJECT_PREFIX="{{ system_update__mail_subject_prefix }}{{ system_update__mail_subject_hostname }}" +SENDER="$SUBJECT_PREFIX <{{ system_update__mail_from }}>" +RECIPIENTS="{{ system_update__mail_recipients_updates | join(' ') }}" + +# the next scheduled run of the regular update lane, so the admin sees when the +# pending updates will actually be applied. update-and-reboot.timer is deployed and +# enabled by this role; systemctl prints the next elapse as a formatted local time +# (or 'n/a' if the timer is disabled, in which case we keep the generic wording). +NEXT_RUN=$(systemctl show update-and-reboot.timer --property=NextElapseUSecRealtime --value 2> /dev/null) +if [ -z "$NEXT_RUN" ] || [ "$NEXT_RUN" = "n/a" ]; then + NEXT_RUN="the next maintenance window" +fi + +cat > /tmp/check-update << EOF +Please reply to this mail if you have any concerns. +These updates will be applied automatically at $NEXT_RUN. +A reboot, if necessary, follows. +--------------------------------------------------- + +EOF + +log "Checking for available updates" + +{% if ansible_facts['os_family'] == 'RedHat' %} +{% if not system_update__cache_only | bool %} +yum clean all 1> /dev/null 2>&1 +{% endif %} +yum check-update {{ system_update__cache_only | bool | ternary("--cacheonly", "") }} >> /tmp/check-update 2> /tmp/system-update-check-stderr +retc=$? + +if [ "$retc" -eq 100 ]; then + # updates are available, so inform + log "Updates pending; notification sent (will be applied at $NEXT_RUN)" + send_msg "$SUBJECT_PREFIX - Update at $NEXT_RUN" "$(cat /tmp/check-update)" +elif [ "$retc" -eq 1 ]; then + log "Update check failed; see /tmp/system-update-check-stderr" + send_msg "$SUBJECT_PREFIX - System update check failed" "$( /dev/null 2>&1 +apt update 1> /dev/null 2>&1 +{% endif %} +apt list --upgradable >> /tmp/check-update + +if [ $(grep upgradable /tmp/check-update | wc -l) -gt 0 ]; then + # updates are available, so inform + log "Updates pending; notification sent" + send_msg "$SUBJECT_PREFIX - Update at $NEXT_RUN" "$(cat /tmp/check-update)" +else + log "No updates pending" +fi +{% endif %} diff --git a/roles/system_update/templates/usr/local/sbin/security-update.j2 b/roles/system_update/templates/usr/local/sbin/security-update.j2 new file mode 100644 index 000000000..11471d427 --- /dev/null +++ b/roles/system_update/templates/usr/local/sbin/security-update.j2 @@ -0,0 +1,104 @@ +#!/usr/bin/env bash +# {{ ansible_managed }} +# 2026061001 + +# Apply Rocky Linux security hot-fixes from the dedicated `security` repository +# only, then request a reboot if one is needed by dropping a file into the +# /run/schedule-reboot spool. This script never reboots itself; the +# schedule-reboot.service (ordered After= this one) performs the reboot at the window. + +export LC_ALL=C + +REPOS="{{ system_update__security_repos | join(',') }}" + +# log progress to stdout so systemd captures it in the journal (journalctl -u +# security-update.service); journald already prefixes a timestamp and the unit name. +log () { + printf '%s\n' "$*" +} + +send_msg () { + local subject="$1" body="$2" + # feed a complete message to sendmail (-t reads the recipients from the headers). + # sendmail has a stable CLI across MTAs, unlike mailx's -r / -a "From:" split. + { + printf 'From: %s\n' "$SENDER" + printf 'To: %s\n' "${RECIPIENTS// /, }" + printf 'Subject: %s\n' "$subject" + printf 'Content-Type: text/plain; charset="utf-8"\n' + printf 'Content-Transfer-Encoding: 8bit\n' + printf '\n%s\n' "$body" + } | sendmail -oi -t -f "$SENDER" +{% if system_update__rocketchat_url is defined and system_update__rocketchat_url | length %} + /usr/bin/curl --silent --output /dev/null --data-urlencode \ + "text=${subject} + + ${body} + {{ system_update__rocketchat_msg_suffix }}" \ + --data-urlencode "parse_mode=HTML" --data-urlencode "disable_web_page_preview=true" \ + "{{ system_update__rocketchat_url }}" +{% endif %} +} + +SUBJECT_PREFIX="{{ system_update__mail_subject_prefix }}{{ system_update__mail_subject_hostname }}" +SENDER="$SUBJECT_PREFIX <{{ system_update__mail_from }}>" +RECIPIENTS="{{ system_update__mail_recipients_updates | join(' ') }}" + +# the reboot-request spool is owned and created by the schedule_reboot role (tmpfiles.d) + +log "Starting security update run" + +# the security lane only acts when at least one of the configured security +# repositories is enabled on this host (provided by the repo_baseos role). skip +# silently otherwise, so the timer can default to enabled without acting on hosts +# where the repo is absent or was opted out via repo_baseos__security_repo_enabled. +have_repo=0 +for repo in {{ system_update__security_repos | join(' ') }}; do + if dnf repolist --enabled 2> /dev/null | awk 'NR > 1 { print $1 }' | grep -qxF "$repo"; then + have_repo=1 + fi +done +if [ "$have_repo" -eq 0 ]; then + log "No configured security repository enabled (looked for: $REPOS); nothing to do" + exit 0 +fi + +# only act when the security repo actually has updates pending. check-update exits +# 100 if updates are available, 0 if there is nothing to do, 1 on error. exit quietly +# on 0 so the daily timer does not send a "Nothing to do" mail. +yum --disablerepo="*" --enablerepo="$REPOS" check-update 1> /dev/null 2> /tmp/security-check-stderr +retc=$? +if [ "$retc" -eq 0 ]; then + log "No security updates pending; nothing to do" + exit 0 +elif [ "$retc" -ne 100 ]; then + log "Security update check failed; see /tmp/security-check-stderr" + send_msg "$SUBJECT_PREFIX - Security update check failed" "$( /tmp/security-update 2> /tmp/security-update-stderr; then + log "Security update failed; see /tmp/security-update-stderr" + send_msg "$SUBJECT_PREFIX - Security update failed" "$( /dev/null +reboothint_rc=$? +needs-restarting 2> /dev/null > /tmp/security-needs-restarting + +if [ "$reboothint_rc" -eq 1 ] || [ -s /tmp/security-needs-restarting ]; then + log "Reboot required after security update; requesting reboot via the spool" + # request a reboot; do-reboot sends the single notification at reboot time + yum history info > /run/schedule-reboot/security_update + exit 0 +fi + +log "Security update installed, no reboot required" +send_msg "$SUBJECT_PREFIX - Security update installed without reboot" "$(cat /tmp/security-update)" diff --git a/roles/system_update/templates/usr/local/sbin/update-and-reboot.j2 b/roles/system_update/templates/usr/local/sbin/update-and-reboot.j2 new file mode 100644 index 000000000..065deea60 --- /dev/null +++ b/roles/system_update/templates/usr/local/sbin/update-and-reboot.j2 @@ -0,0 +1,163 @@ +#!/usr/bin/env bash +# {{ ansible_managed }} +# 2026061001 + +# Apply all available updates, then request a reboot if one is needed by dropping a +# file into the /run/schedule-reboot spool. This script never reboots itself; the +# schedule-reboot.service (ordered After= this one) performs the reboot at the window. + +export LC_ALL=C + +# log progress to stdout so systemd captures it in the journal (journalctl -u +# update-and-reboot.service); journald already prefixes a timestamp and the unit name. +log () { + printf '%s\n' "$*" +} + +send_msg () { + local subject="$1" body="$2" + # feed a complete message to sendmail (-t reads the recipients from the headers). + # sendmail has a stable CLI across MTAs, unlike mailx's -r / -a "From:" split. + { + printf 'From: %s\n' "$SENDER" + printf 'To: %s\n' "${RECIPIENTS// /, }" + printf 'Subject: %s\n' "$subject" + printf 'Content-Type: text/plain; charset="utf-8"\n' + printf 'Content-Transfer-Encoding: 8bit\n' + printf '\n%s\n' "$body" + } | sendmail -oi -t -f "$SENDER" +{% if system_update__rocketchat_url is defined and system_update__rocketchat_url | length %} + /usr/bin/curl --silent --output /dev/null --data-urlencode \ + "text=${subject} + + ${body} + {{ system_update__rocketchat_msg_suffix }}" \ + --data-urlencode "parse_mode=HTML" --data-urlencode "disable_web_page_preview=true" \ + "{{ system_update__rocketchat_url }}" +{% endif %} +} + +{% if system_update__post_update_code is defined and system_update__post_update_code | length %} +post_update_code () { +{{ system_update__post_update_code | indent(4, first=True) }} +} +{% endif %} + +SUBJECT_PREFIX="{{ system_update__mail_subject_prefix }}{{ system_update__mail_subject_hostname }}" +SENDER="$SUBJECT_PREFIX <{{ system_update__mail_from }}>" +RECIPIENTS="{{ system_update__mail_recipients_updates | join(' ' ) }}" +RECIPIENTS_NEW_CONFIGFILES="{{ system_update__mail_recipients_new_configfiles | join(' ' ) }}" + +# the reboot-request spool is owned and created by the schedule_reboot role (tmpfiles.d) + +log "Starting system update" + +{% if system_update__pre_update_code is defined and system_update__pre_update_code | length %} +# start raw system_update__pre_update_code +{{ system_update__pre_update_code }} +# end raw system_update__pre_update_code +{% endif %} + +if systemctl is-active --quiet aide-check.timer && systemctl is-failed --quiet aide-check.service; then + cp /var/log/aide/aide.log /var/log/aide/aide.log-pre-system-update + send_msg "$SUBJECT_PREFIX - aide-check.service state was failed before System Update" "Please check the logfile at /var/log/aide/aide.log-pre-system-update (saved before the update ran)." +fi + +# do the update, and print only critical errors about which we must be told +log "Applying all available updates" +{% if ansible_facts['os_family'] == 'RedHat' %} +if ! yum -y update {{ system_update__cache_only | bool | ternary("--cacheonly --setopt=metadata_timer_sync=0", "") }} 1> /dev/null 2> /tmp/system-update-stderr; then + log "System update failed; see /tmp/system-update-stderr" + send_msg "$SUBJECT_PREFIX - System update failed" "$( /dev/null + \mv /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz + systemctl restart aide-check.service +fi + +{% if ansible_facts['os_family'] == 'Debian' %} +apt-get update {{ system_update__cache_only | bool | ternary("--no-download", "") }} 1> /dev/null +export DEBIAN_FRONTEND=noninteractive +yes '' | apt-get -y -o DPkg::options::="--force-confdef" -o DPkg::options::="--force-confold" --with-new-pkgs upgrade > /tmp/update_output +{% endif %} + +# after any update, notify about new config-file stubs (*.rpmnew / *.rpmsave on RHEL, +# *.dpkg-dist / *.ucf-dist on Debian). one mail per file, sent via sendmail. +# 2> /dev/null hides messages like 'find: '/proc/26144': No such file or directory'. +find / -mount \( -name '*.rpmnew' -o -name '*.rpmsave' -o -name '*.dpkg-dist' -o -name '*.ucf-dist' \) -print0 2> /dev/null \ + | while IFS= read -r -d '' configfile; do + # the body keeps the full path; only the subject is elided when very long (e.g. + # files deep inside container storage), so the Subject stays readable. the bash + # length operator below is wrapped in a Jinja raw block so its brace-hash is not + # taken as a Jinja comment when Ansible renders this template. + subject_path="$configfile" + if [ "{% raw %}${#subject_path}{% endraw %}" -gt 100 ]; then + subject_path="${configfile:0:35}...${configfile: -60}" + fi + { + printf 'From: %s\n' "$SENDER" + printf 'To: %s\n' "${RECIPIENTS_NEW_CONFIGFILES// /, }" + printf 'Subject: %s - config-file stub found: %s\n' "$SUBJECT_PREFIX" "$subject_path" + printf 'Content-Type: text/plain; charset="utf-8"\n' + printf 'Content-Transfer-Encoding: 8bit\n' + printf '\n%s\n' "$configfile" + } | sendmail -oi -t -f "$SENDER" +done + +# any restarts needed? if so, request a reboot via the spool (do not reboot here) +{% if ansible_facts['os_family'] == 'RedHat' %} +needs-restarting --reboothint &> /dev/null +if [ $? -eq 1 ]; then + log "Reboot required (kernel/core component update); requesting reboot via the spool" + MSGBODY=$(yum history info) +{% if system_update__post_update_code is defined and system_update__post_update_code | length %} + post_update_code +{% endif %} + # request a reboot; do-reboot sends the single notification at reboot time + printf '%s\n' "$MSGBODY" > /run/schedule-reboot/system_update + exit 0 +fi + +needs-restarting 2> /dev/null > /tmp/needs-restarting +if [ -s /tmp/needs-restarting ]; then + log "Reboot required (running services still use pre-update libraries); requesting reboot via the spool" + MSGBODY=$(cat /tmp/needs-restarting; yum history info) +{% if system_update__post_update_code is defined and system_update__post_update_code | length %} + post_update_code +{% endif %} + # request a reboot; do-reboot sends the single notification at reboot time + printf '%s\n' "$MSGBODY" > /run/schedule-reboot/system_update + exit 0 +fi +{% endif %} + +{% if ansible_facts['os_family'] == 'Debian' %} +# we call needrestart with the -p option so it behaves like a nagios plugin. any exit code != 0 means a reboot is required +if [ -f /var/run/reboot-required ] || ! needrestart -p 1> /dev/null; then + log "Reboot required; requesting reboot via the spool" + MSGBODY=$(cat /tmp/update_output) +{% if system_update__post_update_code is defined and system_update__post_update_code | length %} + post_update_code +{% endif %} + # request a reboot; do-reboot sends the single notification at reboot time + printf '%s\n' "$MSGBODY" > /run/schedule-reboot/system_update + exit 0 +fi +{% endif %} + +{% if system_update__post_update_code is defined and system_update__post_update_code | length %} +post_update_code +{% endif %} + +{% if ansible_facts['os_family'] == 'RedHat' %} +log "System updated, no reboot required" +send_msg "$SUBJECT_PREFIX - System updated without Reboot" "$(yum history info)" +{% endif %} + +{% if ansible_facts['os_family'] == 'Debian' %} +send_msg "$SUBJECT_PREFIX - System updated without Reboot" "$(cat /tmp/update_output)" +{% endif %}