* [pve-devel] [PATCH ceph master 1/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes
2024-02-15 9:40 [pve-devel] [PATCH ceph master+quincy-stable-8 0/3] fix #5213: avoid connection freezes when installing/upgrading ceph-osd Friedrich Weber
@ 2024-02-15 9:40 ` Friedrich Weber
2024-02-15 13:16 ` [pve-devel] applied: " Thomas Lamprecht
2024-02-15 9:40 ` [pve-devel] [PATCH ceph quincy-stable-8 2/3] " Friedrich Weber
2024-02-15 9:40 ` [pve-devel] [PATCH ceph master 3/3] buildsys: add check for changed ceph-osd sysctl settings Friedrich Weber
2 siblings, 1 reply; 8+ messages in thread
From: Friedrich Weber @ 2024-02-15 9:40 UTC (permalink / raw)
To: pve-devel
Assume there is an open TCP connection to a VM, and ceph-osd is
installed/upgraded on the host on which the PVE firewall is active.
Currently, ceph-osd postinst reloads all sysctl settings. Thus,
installing/upgrading ceph-osd will set the sysctl setting
`net.bridge.bridge-nf-call-iptables` to 0. The PVE firewall will flip
the setting back to 1 in its next iteration (in <10 seconds). But
while the setting is 0, conntrack will not see packets of the existing
TCP connection. When the setting is flipped back to 1, conntrack will
see packets again, but may consider the seq/ack numbers of new packets
out-of-window, mark them as invalid and drop them. This will freeze
the TCP connection.
To avoid this, add a patch that modifies the ceph-osd postinst to only
apply settings from the sysctl settings file shipped with ceph-osd,
and only apply them on fresh install. As the ceph-osd sysctl settings
do not set `net.bridge.bridge-nf-call-iptables`, this will avoid the
temporary flip to 0 when installing/upgrading ceph-osd.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
...t-avoid-reloading-all-sysctl-setting.patch | 47 +++++++++++++++++++
patches/series | 1 +
2 files changed, 48 insertions(+)
create mode 100644 patches/0015-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
diff --git a/patches/0015-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch b/patches/0015-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
new file mode 100644
index 000000000..947175605
--- /dev/null
+++ b/patches/0015-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
@@ -0,0 +1,47 @@
+From 232b1fa3210a56354b27f9c6154819307412b91c Mon Sep 17 00:00:00 2001
+From: Friedrich Weber <f.weber@proxmox.com>
+Date: Thu, 8 Feb 2024 16:20:08 +0100
+Subject: [PATCH] ceph-osd postinst: do not always reload all sysctl settings
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+ceph-osd installs a /etc/sysctl.d/30-ceph-osd.conf with custom sysctl
+settings. Currently, in order to apply them, ceph-osd postinst always
+restarts procps. However, this triggers a reload of *all* sysctl
+settings when installing or upgrading the ceph-osd package. This may
+needlessly reset unrelated settings manually changed by the user.
+
+To avoid this, invoke /lib/systemd/systemd-sysctl manually to apply
+the custom sysctl settings only, and only do so on fresh installs of
+the package.
+
+If 30-ceph-osd.conf is changed in the future, the ceph-osd postinst
+will need to be adjusted to apply the sysctl settings on upgrade too.
+
+Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ debian/ceph-osd.postinst | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/debian/ceph-osd.postinst b/debian/ceph-osd.postinst
+index 04e33b8601f..2bcd8d4dcb4 100644
+--- a/debian/ceph-osd.postinst
++++ b/debian/ceph-osd.postinst
+@@ -24,7 +24,11 @@ set -e
+
+ case "$1" in
+ configure)
+- [ -x /etc/init.d/procps ] && invoke-rc.d procps restart || :
++ # apply (only) new parameters, but only on fresh install
++ if [ -z "$2" ]; then
++ /lib/systemd/systemd-sysctl /etc/sysctl.d/30-ceph-osd.conf \
++ >/dev/null || :
++ fi
+ [ -x /sbin/start ] && start ceph-osd-all || :
+ ;;
+ abort-upgrade|abort-remove|abort-deconfigure)
+--
+2.39.2
+
diff --git a/patches/series b/patches/series
index 865caf23d..6ad754713 100644
--- a/patches/series
+++ b/patches/series
@@ -12,3 +12,4 @@
0012-backport-mgr-dashboard-simplify-authentication-proto.patch
0013-mgr-dashboard-remove-ability-to-create-and-check-TLS.patch
0014-rocksb-inherit-parent-cmake-cxx-flags.patch
+0015-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pve-devel] applied: [PATCH ceph master 1/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes
2024-02-15 9:40 ` [pve-devel] [PATCH ceph master 1/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes Friedrich Weber
@ 2024-02-15 13:16 ` Thomas Lamprecht
2024-02-16 13:54 ` Friedrich Weber
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Lamprecht @ 2024-02-15 13:16 UTC (permalink / raw)
To: Proxmox VE development discussion, Friedrich Weber
Am 15/02/2024 um 10:40 schrieb Friedrich Weber:
> Assume there is an open TCP connection to a VM, and ceph-osd is
> installed/upgraded on the host on which the PVE firewall is active.
> Currently, ceph-osd postinst reloads all sysctl settings. Thus,
> installing/upgrading ceph-osd will set the sysctl setting
> `net.bridge.bridge-nf-call-iptables` to 0. The PVE firewall will flip
> the setting back to 1 in its next iteration (in <10 seconds). But
> while the setting is 0, conntrack will not see packets of the existing
> TCP connection. When the setting is flipped back to 1, conntrack will
> see packets again, but may consider the seq/ack numbers of new packets
> out-of-window, mark them as invalid and drop them. This will freeze
> the TCP connection.
>
> To avoid this, add a patch that modifies the ceph-osd postinst to only
> apply settings from the sysctl settings file shipped with ceph-osd,
> and only apply them on fresh install. As the ceph-osd sysctl settings
> do not set `net.bridge.bridge-nf-call-iptables`, this will avoid the
> temporary flip to 0 when installing/upgrading ceph-osd.
>
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> ...t-avoid-reloading-all-sysctl-setting.patch | 47 +++++++++++++++++++
> patches/series | 1 +
> 2 files changed, 48 insertions(+)
> create mode 100644 patches/0015-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
>
>
applied, thanks!
as talked off-list, ceph is really not trying to reduce confusion potential
doing things like:
install -D -m 644 etc/sysctl/90-ceph-osd.conf $(DESTDIR)/etc/sysctl.d/30-ceph-osd.conf
I.e., having it checked in as 90-... but installing it as 30-..
And while I think the argument for "admin could have overrides that this
affects", which you mentioned that Fabian brought up off-list, is fine,
but is just as true on initial installation.
What might be better is one (or some) of:
- do nothing, just install the file and be done, a reboot sorts this out
sooner or later anyway.
- a script that checks if there are any overrides and only sets it up if
there are none else warns.
- just warns visible in general if lower values are detected.
- drop our odd disabling of the `net.bridge.bridge-nf-call-iptables`
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pve-devel] applied: [PATCH ceph master 1/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes
2024-02-15 13:16 ` [pve-devel] applied: " Thomas Lamprecht
@ 2024-02-16 13:54 ` Friedrich Weber
0 siblings, 0 replies; 8+ messages in thread
From: Friedrich Weber @ 2024-02-16 13:54 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
On 15/02/2024 14:16, Thomas Lamprecht wrote:
[...]
>
> applied, thanks!
>
> as talked off-list, ceph is really not trying to reduce confusion potential
> doing things like:
>
> install -D -m 644 etc/sysctl/90-ceph-osd.conf $(DESTDIR)/etc/sysctl.d/30-ceph-osd.conf
>
> I.e., having it checked in as 90-... but installing it as 30-..
Seems like the rpm installs it as 90-ceph-osd.conf though :)
https://github.com/ceph/ceph/blob/fda8b5acbd7381dc4d86d7df5389e22aacffec22/ceph.spec.in#L1526
> And while I think the argument for "admin could have overrides that this
> affects", which you mentioned that Fabian brought up off-list, is fine,
> but is just as true on initial installation.
>
> What might be better is one (or some) of:
[...]> - drop our odd disabling of the `net.bridge.bridge-nf-call-iptables`
For PVE this might make sense independently of what ceph-osd postinst
does. We've been explicitly disabling the setting since 2012 though [1]
-- will try to find out if this is still needed. If we drop it, would
this be a breaking chance and need to wait for PVE 9?
[1]
https://git.proxmox.com/?p=pve-cluster.git;a=commit;h=501839cac97f68d4dcba21df6fb3797b976e9e56
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pve-devel] [PATCH ceph quincy-stable-8 2/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes
2024-02-15 9:40 [pve-devel] [PATCH ceph master+quincy-stable-8 0/3] fix #5213: avoid connection freezes when installing/upgrading ceph-osd Friedrich Weber
2024-02-15 9:40 ` [pve-devel] [PATCH ceph master 1/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes Friedrich Weber
@ 2024-02-15 9:40 ` Friedrich Weber
2024-02-15 13:17 ` [pve-devel] applied: " Thomas Lamprecht
2024-02-15 9:40 ` [pve-devel] [PATCH ceph master 3/3] buildsys: add check for changed ceph-osd sysctl settings Friedrich Weber
2 siblings, 1 reply; 8+ messages in thread
From: Friedrich Weber @ 2024-02-15 9:40 UTC (permalink / raw)
To: pve-devel
Assume there is an open TCP connection to a VM, and ceph-osd is
installed/upgraded on the host on which the PVE firewall is active.
Currently, ceph-osd postinst reloads all sysctl settings. Thus,
installing/upgrading ceph-osd will set the sysctl setting
`net.bridge.bridge-nf-call-iptables` to 0. The PVE firewall will flip
the setting back to 1 in its next iteration (in <10 seconds). But
while the setting is 0, conntrack will not see packets of the existing
TCP connection. When the setting is flipped back to 1, conntrack will
see packets again, but may consider the seq/ack numbers of new packets
out-of-window, mark them as invalid and drop them. This will freeze
the TCP connection.
To avoid this, add a patch that modifies the ceph-osd postinst to only
apply settings from the sysctl settings file shipped with ceph-osd,
and only apply them on fresh install. As the ceph-osd sysctl settings
do not set `net.bridge.bridge-nf-call-iptables`, this will avoid the
temporary flip to 0 when installing/upgrading ceph-osd.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
...t-avoid-reloading-all-sysctl-setting.patch | 47 +++++++++++++++++++
patches/series | 1 +
2 files changed, 48 insertions(+)
create mode 100644 patches/0024-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
diff --git a/patches/0024-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch b/patches/0024-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
new file mode 100644
index 000000000..947175605
--- /dev/null
+++ b/patches/0024-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
@@ -0,0 +1,47 @@
+From 232b1fa3210a56354b27f9c6154819307412b91c Mon Sep 17 00:00:00 2001
+From: Friedrich Weber <f.weber@proxmox.com>
+Date: Thu, 8 Feb 2024 16:20:08 +0100
+Subject: [PATCH] ceph-osd postinst: do not always reload all sysctl settings
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+ceph-osd installs a /etc/sysctl.d/30-ceph-osd.conf with custom sysctl
+settings. Currently, in order to apply them, ceph-osd postinst always
+restarts procps. However, this triggers a reload of *all* sysctl
+settings when installing or upgrading the ceph-osd package. This may
+needlessly reset unrelated settings manually changed by the user.
+
+To avoid this, invoke /lib/systemd/systemd-sysctl manually to apply
+the custom sysctl settings only, and only do so on fresh installs of
+the package.
+
+If 30-ceph-osd.conf is changed in the future, the ceph-osd postinst
+will need to be adjusted to apply the sysctl settings on upgrade too.
+
+Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ debian/ceph-osd.postinst | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/debian/ceph-osd.postinst b/debian/ceph-osd.postinst
+index 04e33b8601f..2bcd8d4dcb4 100644
+--- a/debian/ceph-osd.postinst
++++ b/debian/ceph-osd.postinst
+@@ -24,7 +24,11 @@ set -e
+
+ case "$1" in
+ configure)
+- [ -x /etc/init.d/procps ] && invoke-rc.d procps restart || :
++ # apply (only) new parameters, but only on fresh install
++ if [ -z "$2" ]; then
++ /lib/systemd/systemd-sysctl /etc/sysctl.d/30-ceph-osd.conf \
++ >/dev/null || :
++ fi
+ [ -x /sbin/start ] && start ceph-osd-all || :
+ ;;
+ abort-upgrade|abort-remove|abort-deconfigure)
+--
+2.39.2
+
diff --git a/patches/series b/patches/series
index ee897a78a..30fc83ec0 100644
--- a/patches/series
+++ b/patches/series
@@ -16,3 +16,4 @@
0021-backport-mgr-dashboard-simplify-authentication-proto.patch
0022-mgr-dashboard-remove-ability-to-create-and-check-TLS.patch
0023-rocksb-inherit-parent-cmake-cxx-flags.patch
+0024-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pve-devel] applied: [PATCH ceph quincy-stable-8 2/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes
2024-02-15 9:40 ` [pve-devel] [PATCH ceph quincy-stable-8 2/3] " Friedrich Weber
@ 2024-02-15 13:17 ` Thomas Lamprecht
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Lamprecht @ 2024-02-15 13:17 UTC (permalink / raw)
To: Proxmox VE development discussion, Friedrich Weber
Am 15/02/2024 um 10:40 schrieb Friedrich Weber:
> Assume there is an open TCP connection to a VM, and ceph-osd is
> installed/upgraded on the host on which the PVE firewall is active.
> Currently, ceph-osd postinst reloads all sysctl settings. Thus,
> installing/upgrading ceph-osd will set the sysctl setting
> `net.bridge.bridge-nf-call-iptables` to 0. The PVE firewall will flip
> the setting back to 1 in its next iteration (in <10 seconds). But
> while the setting is 0, conntrack will not see packets of the existing
> TCP connection. When the setting is flipped back to 1, conntrack will
> see packets again, but may consider the seq/ack numbers of new packets
> out-of-window, mark them as invalid and drop them. This will freeze
> the TCP connection.
>
> To avoid this, add a patch that modifies the ceph-osd postinst to only
> apply settings from the sysctl settings file shipped with ceph-osd,
> and only apply them on fresh install. As the ceph-osd sysctl settings
> do not set `net.bridge.bridge-nf-call-iptables`, this will avoid the
> temporary flip to 0 when installing/upgrading ceph-osd.
>
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> ...t-avoid-reloading-all-sysctl-setting.patch | 47 +++++++++++++++++++
> patches/series | 1 +
> 2 files changed, 48 insertions(+)
> create mode 100644 patches/0024-ceph-osd-postinst-avoid-reloading-all-sysctl-setting.patch
>
>
applied, same holds as replied to patch 1/3, but for quincy I'd not
bother changing such things much at this stage of its lifecycle, thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pve-devel] [PATCH ceph master 3/3] buildsys: add check for changed ceph-osd sysctl settings
2024-02-15 9:40 [pve-devel] [PATCH ceph master+quincy-stable-8 0/3] fix #5213: avoid connection freezes when installing/upgrading ceph-osd Friedrich Weber
2024-02-15 9:40 ` [pve-devel] [PATCH ceph master 1/3] fix #5213: ceph-osd postinst: add patch to avoid connection freezes Friedrich Weber
2024-02-15 9:40 ` [pve-devel] [PATCH ceph quincy-stable-8 2/3] " Friedrich Weber
@ 2024-02-15 9:40 ` Friedrich Weber
2024-02-15 13:20 ` Thomas Lamprecht
2 siblings, 1 reply; 8+ messages in thread
From: Friedrich Weber @ 2024-02-15 9:40 UTC (permalink / raw)
To: pve-devel
If the ceph-osd sysctl settings template (30-ceph-osd.conf.in) shipped
by upstream changes, our ceph-osd postinst patch will need to be
adapted to apply the new settings on package upgrade. To make sure we
do not forget, store the current checksum of that file in our Makefile
and fail the build early if the checksums do not match.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
Notes:
- failing the build might be a bit drastic, but a simple warning seems
too easy to overlook
- strictly speaking we'll miss updates of `sysctl_pid_max` [1], but to
catch this, we'd need to check the generated 30-ceph-osd.conf
(without .in) after it is generated during the build process, and
failing then sounds may be too annoying :)
[1] https://github.com/ceph/ceph/blob/main/etc/sysctl/90-ceph-osd.conf.in
Makefile | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/Makefile b/Makefile
index 9c23cadc5..e99b3b356 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,11 @@ DEBS=$(MAIN_DEB) $(DEBS_REST)
DSC=ceph_${PKGVER}.dsc
+# ceph-osd postinst will need to be adjusted if the upstream 90-ceph-osd.conf.in
+# changes to make sure new settings are applied on package upgrade
+SYSCTL_CONF=etc/sysctl/90-ceph-osd.conf.in
+SYSCTL_CONF_CHECKSUM=0e3b515c4a81a5b118dbc9e08baec0dbd2a460b781b655e1e84807ccfb5827b4
+
all: ${DEBS} ${DBG_DEBS}
@echo ${DEBS}
@echo ${DBG_DEBS}
@@ -90,6 +95,8 @@ ${BUILDSRC}: ${SRCDIR} patches
deb: ${DEBS} ${DBG_DEBS}
${DEBS_REST} ${DBG_DEBS}: $(MAIN_DEB)
$(MAIN_DEB): ${BUILDSRC}
+ @echo "${SYSCTL_CONF_CHECKSUM} ${BUILDSRC}/${SYSCTL_CONF}" | sha256sum -c || \
+ (echo "sysctl settings file changed, adjust ceph-osd postinst and Makefile!"; exit 1)
cd ${BUILDSRC}; dpkg-buildpackage -b -uc -us
lintian ${DEBS}
@echo ${DEBS}
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pve-devel] [PATCH ceph master 3/3] buildsys: add check for changed ceph-osd sysctl settings
2024-02-15 9:40 ` [pve-devel] [PATCH ceph master 3/3] buildsys: add check for changed ceph-osd sysctl settings Friedrich Weber
@ 2024-02-15 13:20 ` Thomas Lamprecht
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Lamprecht @ 2024-02-15 13:20 UTC (permalink / raw)
To: Proxmox VE development discussion, Friedrich Weber
Am 15/02/2024 um 10:40 schrieb Friedrich Weber:
> If the ceph-osd sysctl settings template (30-ceph-osd.conf.in) shipped
> by upstream changes, our ceph-osd postinst patch will need to be
> adapted to apply the new settings on package upgrade. To make sure we
> do not forget, store the current checksum of that file in our Makefile
> and fail the build early if the checksums do not match.
>
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
>
> Notes:
> - failing the build might be a bit drastic, but a simple warning seems
> too easy to overlook
> - strictly speaking we'll miss updates of `sysctl_pid_max` [1], but to
> catch this, we'd need to check the generated 30-ceph-osd.conf
> (without .in) after it is generated during the build process, and
> failing then sounds may be too annoying :)
>
> [1] https://github.com/ceph/ceph/blob/main/etc/sysctl/90-ceph-osd.conf.in
>
noticing is one thing, but then having an actual plan what to do is something
else, as most options would again have the possibility to override existing
sysctls. IMO just installing the new file is fine too, and judging from the
past this file won't see that much churn, especially during stable point
releases, so omitting this one for now, maybe checking out the bigger picture
(see reply to patch 1/3) leads to some improvements that make it really
obsolete anyway.
^ permalink raw reply [flat|nested] 8+ messages in thread