Skip to content

client: Explicitly systemctl start rpm-ostreed if root, dump status#2945

Merged
cgwalters merged 1 commit into
coreos:mainfrom
cgwalters:systemd-start-explicit
Jun 30, 2021
Merged

client: Explicitly systemctl start rpm-ostreed if root, dump status#2945
cgwalters merged 1 commit into
coreos:mainfrom
cgwalters:systemd-start-explicit

Conversation

@cgwalters
Copy link
Copy Markdown
Member

This is a hackier an alternative to #2932
that we can ship immediately because we won't block on SELinux policy.

For historical reasons, the daemon ends up doing a lot of
initialization before even claiming the DBus name. For example,
it calls ostree_sysroot_load(). We also end up scanning
the RPM database, and actually parse all the GPG keys
in /etc/pki/rpm-gpg etc.

This causes two related problems:

  • By doing all this work before claiming the bus name, we
    race against the (pretty low) DBus service timeout of 25s.
  • If something hard fails at initialization, the client can't
    easily see the error because it just appears in the systemd
    journal. The client will just see a service timeout.

By explicitly using systemctl start rpm-ostreed.service,
systemd does all of the error checking for us without involving
dbus-broker as a middleman.

Further, by using systemctl status rpm-ostreed.service on
failure, we reuse systemd's nice rendering of the status
of the unit instead of reinventing our own.

This PR effectively replicates openshift/machine-config-operator#2642
in our code.

@cgwalters
Copy link
Copy Markdown
Member Author

Demo:

[root@cosa-devsh ~]# systemctl stop rpm-ostreed
[root@cosa-devsh ~]# umount /boot
[root@cosa-devsh ~]# rpm-ostree status
Job for rpm-ostreed.service failed because the control process exited with error code.
See "systemctl status rpm-ostreed.service" and "journalctl -xeu rpm-ostreed.service" for details.
× rpm-ostreed.service - rpm-ostree System Management Daemon
     Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
     Active: failed (Result: exit-code) since Tue 2021-06-29 20:55:44 UTC; 11ms ago
       Docs: man:rpm-ostree(1)
    Process: 1342 ExecStart=/usr/bin/rpm-ostree start-daemon (code=exited, status=1/FAILURE)
   Main PID: 1342 (code=exited, status=1/FAILURE)
     Status: "error: Couldn't start daemon: Error setting up sysroot: loading sysroot: Unexpected state: /run/ostree-booted found, but no /boot/loader directory"
        CPU: 32ms

Jun 29 20:55:44 cosa-devsh systemd[1]: Starting rpm-ostree System Management Daemon...
Jun 29 20:55:44 cosa-devsh rpm-ostree[1342]: Reading config file '/etc/rpm-ostreed.conf'
Jun 29 20:55:44 cosa-devsh rpm-ostree[1342]: error: Couldn't start daemon: Error setting up sysroot: loading sysroot: Unexpected state: /run/ostree-booted found, but no /boot/loader directory
Jun 29 20:55:44 cosa-devsh systemd[1]: rpm-ostreed.service: Main process exited, code=exited, status=1/FAILURE
Jun 29 20:55:44 cosa-devsh systemd[1]: rpm-ostreed.service: Failed with result 'exit-code'.
Jun 29 20:55:44 cosa-devsh systemd[1]: Failed to start rpm-ostree System Management Daemon.
error: exit code: 1
[root@cosa-devsh ~]# 

This is a hackier an alternative to coreos#2932
that we can ship immediately because we won't block on SELinux policy.

For historical reasons, the daemon ends up doing a lot of
initialization before even claiming the DBus name.  For example,
it calls `ostree_sysroot_load()`.  We also end up scanning
the RPM database, and actually parse all the GPG keys
in `/etc/pki/rpm-gpg` etc.

This causes two related problems:

- By doing all this work before claiming the bus name, we
  race against the (pretty low) DBus service timeout of 25s.
- If something hard fails at initialization, the client can't
  easily see the error because it just appears in the systemd
  journal.  The client will just see a service timeout.

By explicitly using `systemctl start rpm-ostreed.service`,
systemd does all of the error checking for us without involving
`dbus-broker` as a middleman.

Further, by using `systemctl status rpm-ostreed.service` on
failure, we reuse systemd's nice rendering of the status
of the unit instead of reinventing our own.

This PR effectively replicates openshift/machine-config-operator#2642
in our code.
@cgwalters cgwalters force-pushed the systemd-start-explicit branch from 51cfe45 to 597bf93 Compare June 30, 2021 13:42
@cgwalters
Copy link
Copy Markdown
Member Author

(Added a test)

Copy link
Copy Markdown
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This makes a lot of sense to me, and I think it's a good simple alternative to #2932.

@cgwalters cgwalters merged commit bd2ef09 into coreos:main Jun 30, 2021
cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Jul 16, 2021
This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1982389
which is already fixed in rpm-ostree in
coreos/rpm-ostree#2995
because it will take a fair while until we can ship the fixed
rpm-ostreed in RHEL and then OpenShift stable versions.
(Yes, this is a sad recurring pattern)

The updater client gains an explicit `Initialize` method, where
we also explicitly `systemctl start rpm-ostreed` which
then effectively rolls in the change from
coreos/rpm-ostree#2945
too.
cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Feb 25, 2022
This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1982389
which is already fixed in rpm-ostree in
coreos/rpm-ostree#2995
because it will take a fair while until we can ship the fixed
rpm-ostreed in RHEL and then OpenShift stable versions.
(Yes, this is a sad recurring pattern)

The updater client gains an explicit `Initialize` method, where
we also explicitly `systemctl start rpm-ostreed` which
then effectively rolls in the change from
coreos/rpm-ostree#2945
too.
cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Jul 8, 2022
This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1982389
which is already fixed in rpm-ostree in
coreos/rpm-ostree#2995
because it will take a fair while until we can ship the fixed
rpm-ostreed in RHEL and then OpenShift stable versions.
(Yes, this is a sad recurring pattern)

The updater client gains an explicit `Initialize` method, where
we also explicitly `systemctl start rpm-ostreed` which
then effectively rolls in the change from
coreos/rpm-ostree#2945
too.
ptalgulk01 pushed a commit to ptalgulk01/machine-config-operator that referenced this pull request May 15, 2026
This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1982389
which is already fixed in rpm-ostree in
coreos/rpm-ostree#2995
because it will take a fair while until we can ship the fixed
rpm-ostreed in RHEL and then OpenShift stable versions.
(Yes, this is a sad recurring pattern)

The updater client gains an explicit `Initialize` method, where
we also explicitly `systemctl start rpm-ostreed` which
then effectively rolls in the change from
coreos/rpm-ostree#2945
too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants