client: Explicitly `systemctl start rpm-ostreed` if root, dump status by cgwalters · Pull Request #2945 · coreos/rpm-ostree

cgwalters · 2021-06-29T20:59:10Z

This is a hackier an alternative to #2932
that we can ship immediately because we won't block on SELinux policy.

For historical reasons, the daemon ends up doing a lot of
initialization before even claiming the DBus name. For example,
it calls ostree_sysroot_load(). We also end up scanning
the RPM database, and actually parse all the GPG keys
in /etc/pki/rpm-gpg etc.

This causes two related problems:

By doing all this work before claiming the bus name, we
race against the (pretty low) DBus service timeout of 25s.
If something hard fails at initialization, the client can't
easily see the error because it just appears in the systemd
journal. The client will just see a service timeout.

By explicitly using systemctl start rpm-ostreed.service,
systemd does all of the error checking for us without involving
dbus-broker as a middleman.

Further, by using systemctl status rpm-ostreed.service on
failure, we reuse systemd's nice rendering of the status
of the unit instead of reinventing our own.

This PR effectively replicates openshift/machine-config-operator#2642
in our code.

cgwalters · 2021-06-29T20:59:22Z

Demo:

[root@cosa-devsh ~]# systemctl stop rpm-ostreed
[root@cosa-devsh ~]# umount /boot
[root@cosa-devsh ~]# rpm-ostree status
Job for rpm-ostreed.service failed because the control process exited with error code.
See "systemctl status rpm-ostreed.service" and "journalctl -xeu rpm-ostreed.service" for details.
× rpm-ostreed.service - rpm-ostree System Management Daemon
     Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
     Active: failed (Result: exit-code) since Tue 2021-06-29 20:55:44 UTC; 11ms ago
       Docs: man:rpm-ostree(1)
    Process: 1342 ExecStart=/usr/bin/rpm-ostree start-daemon (code=exited, status=1/FAILURE)
   Main PID: 1342 (code=exited, status=1/FAILURE)
     Status: "error: Couldn't start daemon: Error setting up sysroot: loading sysroot: Unexpected state: /run/ostree-booted found, but no /boot/loader directory"
        CPU: 32ms

Jun 29 20:55:44 cosa-devsh systemd[1]: Starting rpm-ostree System Management Daemon...
Jun 29 20:55:44 cosa-devsh rpm-ostree[1342]: Reading config file '/etc/rpm-ostreed.conf'
Jun 29 20:55:44 cosa-devsh rpm-ostree[1342]: error: Couldn't start daemon: Error setting up sysroot: loading sysroot: Unexpected state: /run/ostree-booted found, but no /boot/loader directory
Jun 29 20:55:44 cosa-devsh systemd[1]: rpm-ostreed.service: Main process exited, code=exited, status=1/FAILURE
Jun 29 20:55:44 cosa-devsh systemd[1]: rpm-ostreed.service: Failed with result 'exit-code'.
Jun 29 20:55:44 cosa-devsh systemd[1]: Failed to start rpm-ostree System Management Daemon.
error: exit code: 1
[root@cosa-devsh ~]#

This is a hackier an alternative to coreos#2932 that we can ship immediately because we won't block on SELinux policy. For historical reasons, the daemon ends up doing a lot of initialization before even claiming the DBus name. For example, it calls `ostree_sysroot_load()`. We also end up scanning the RPM database, and actually parse all the GPG keys in `/etc/pki/rpm-gpg` etc. This causes two related problems: - By doing all this work before claiming the bus name, we race against the (pretty low) DBus service timeout of 25s. - If something hard fails at initialization, the client can't easily see the error because it just appears in the systemd journal. The client will just see a service timeout. By explicitly using `systemctl start rpm-ostreed.service`, systemd does all of the error checking for us without involving `dbus-broker` as a middleman. Further, by using `systemctl status rpm-ostreed.service` on failure, we reuse systemd's nice rendering of the status of the unit instead of reinventing our own. This PR effectively replicates openshift/machine-config-operator#2642 in our code.

cgwalters · 2021-06-30T14:09:00Z

(Added a test)

jlebon

Nice! This makes a lot of sense to me, and I think it's a good simple alternative to #2932.

This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1982389 which is already fixed in rpm-ostree in coreos/rpm-ostree#2995 because it will take a fair while until we can ship the fixed rpm-ostreed in RHEL and then OpenShift stable versions. (Yes, this is a sad recurring pattern) The updater client gains an explicit `Initialize` method, where we also explicitly `systemctl start rpm-ostreed` which then effectively rolls in the change from coreos/rpm-ostree#2945 too.

cgwalters mentioned this pull request Jun 29, 2021

daemon: Dump systemctl status rpm-ostreed on load failure openshift/machine-config-operator#2642

Merged

cgwalters force-pushed the systemd-start-explicit branch from 51cfe45 to 597bf93 Compare June 30, 2021 13:42

jlebon approved these changes Jun 30, 2021

View reviewed changes

jlebon mentioned this pull request Jun 30, 2021

daemon: Add socket activation via /run/rpm-ostreed.socket #2932

Closed

cgwalters merged commit bd2ef09 into coreos:main Jun 30, 2021

cgwalters mentioned this pull request Jul 15, 2021

daemon: Explicitly start rpm-ostreed, restart if we detect active txn openshift/machine-config-operator#2677

Merged

cgwalters mentioned this pull request Feb 25, 2022

Bug 2057544: daemon: Explicitly start rpm-ostreed, restart if we detect active txn openshift/machine-config-operator#2967

Merged

cgwalters mentioned this pull request Mar 16, 2022

client: Don't invoke systemctl start if unit is already active #3523

Merged

cgwalters mentioned this pull request Jul 7, 2022

Bug 2105082: 4.7: daemon: Explicitly start rpm-ostreed, restart if we detect active txn openshift/machine-config-operator#3240

Merged

cgwalters mentioned this pull request Jul 19, 2022

Support delegation of privilege using LoadCredential=, add socket activation #3850

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: Explicitly `systemctl start rpm-ostreed` if root, dump status#2945

client: Explicitly `systemctl start rpm-ostreed` if root, dump status#2945
cgwalters merged 1 commit into
coreos:mainfrom
cgwalters:systemd-start-explicit

cgwalters commented Jun 29, 2021

Uh oh!

cgwalters commented Jun 29, 2021

Uh oh!

cgwalters commented Jun 30, 2021

Uh oh!

jlebon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cgwalters commented Jun 29, 2021

Uh oh!

cgwalters commented Jun 29, 2021

Uh oh!

cgwalters commented Jun 30, 2021

Uh oh!

jlebon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants