public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH pve-ceph] fix compatibility with CPUs not supporting SSE 4.1 instructions
@ 2023-09-18 15:46 Stefan Hanreich
  2023-09-18 16:02 ` Dietmar Maurer
  2023-10-31  9:52 ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 2 replies; 4+ messages in thread
From: Stefan Hanreich @ 2023-09-18 15:46 UTC (permalink / raw)
  To: pve-devel

One of our users ran into issues with running Ceph on older CPU
architectures [1]. This is apparently due to a bug in gcc-12 that
leads to SSE 4.1 instructions always being executed rather than
dynamically dispatching functions using those instructions. Those
binaries then break on older CPUs that do not support this instruction
set.

I've ran some benchmarks with `rados bench` against our last release
(18.2.0-pve2) and this new version. The commands were taken from our
latest Ceph benchmarking paper [2]. The results showed that this patch
does not lead to performance regressions on newer hardware.

                  18.2.0-pve2    this patch
Read EC           4574.28        4651.95
Write EC          3739.59        3773.87
Read Replicated   5345.34        5568.41
Write Replicated  4123.28        4066.19
(numbers correspond to bandwidth in MB/s)

[1] https://forum.proxmox.com/threads/proxmox-8-ceph-quincy-monitor-no-longer-working-on-amd-opteron-2427.129613
[2] https://www.proxmox.com/en/downloads/proxmox-virtual-environment/documentation/proxmox-ve-ceph-benchmark-2020-09

Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
---
 ...y-with-CPUs-not-supporting-SSE-4.1-i.patch | 32 +++++++++++++++++++
 patches/series                                |  1 +
 2 files changed, 33 insertions(+)
 create mode 100644 patches/0015-fix-compatibility-with-CPUs-not-supporting-SSE-4.1-i.patch

diff --git a/patches/0015-fix-compatibility-with-CPUs-not-supporting-SSE-4.1-i.patch b/patches/0015-fix-compatibility-with-CPUs-not-supporting-SSE-4.1-i.patch
new file mode 100644
index 000000000..a44aefafb
--- /dev/null
+++ b/patches/0015-fix-compatibility-with-CPUs-not-supporting-SSE-4.1-i.patch
@@ -0,0 +1,32 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Hanreich <s.hanreich@proxmox.com>
+Date: Fri, 15 Sep 2023 16:55:02 +0200
+Subject: [PATCH] fix compatibility with CPUs not supporting SSE 4.1
+ instructions
+
+Building without -O1 causes gcc-12 to emit SSE 4.1 instructions which
+are not supported on older CPU architectures. This leads to Ceph
+crashing on older CPU architectures. -O1 causes those optimizations to
+be implemented manually via runtime dispatch.
+
+Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
+---
+ src/erasure-code/jerasure/CMakeLists.txt | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/src/erasure-code/jerasure/CMakeLists.txt b/src/erasure-code/jerasure/CMakeLists.txt
+index f9cd22e11..b0a355235 100644
+--- a/src/erasure-code/jerasure/CMakeLists.txt
++++ b/src/erasure-code/jerasure/CMakeLists.txt
+@@ -67,7 +67,7 @@ endif()
+ 
+ add_library(gf-complete_objs OBJECT ${gf-complete_srcs})
+ set_target_properties(gf-complete_objs PROPERTIES 
+-  COMPILE_FLAGS "${SIMD_COMPILE_FLAGS}")
++  COMPILE_FLAGS "${SIMD_COMPILE_FLAGS} -O1")
+ set_target_properties(gf-complete_objs PROPERTIES 
+   COMPILE_DEFINITIONS "${GF_COMPILE_FLAGS}")
+ 
+-- 
+2.39.2
+
diff --git a/patches/series b/patches/series
index c78de0235..df9d7baf6 100644
--- a/patches/series
+++ b/patches/series
@@ -12,3 +12,4 @@
 0012-fix-4759-run-ceph-crash-daemon-with-www-data-group-f.patch
 0013-d-rules-compile-with-gcc-12.patch
 0014-debian-add-missing-bcrypt-to-manager-.requires.patch
+0015-fix-compatibility-with-CPUs-not-supporting-SSE-4.1-i.patch
-- 
2.39.2




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH pve-ceph] fix compatibility with CPUs not supporting SSE 4.1 instructions
  2023-09-18 15:46 [pve-devel] [PATCH pve-ceph] fix compatibility with CPUs not supporting SSE 4.1 instructions Stefan Hanreich
@ 2023-09-18 16:02 ` Dietmar Maurer
  2023-09-19 10:06   ` Thomas Lamprecht
  2023-10-31  9:52 ` [pve-devel] applied: " Thomas Lamprecht
  1 sibling, 1 reply; 4+ messages in thread
From: Dietmar Maurer @ 2023-09-18 16:02 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Hanreich

> One of our users ran into issues with running Ceph on older CPU
> architectures [1]. This is apparently due to a bug in gcc-12 that
> leads to SSE 4.1 instructions always being executed rather than
> dynamically dispatching functions using those instructions.

Cant we fix the GCC bug instead?




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH pve-ceph] fix compatibility with CPUs not supporting SSE 4.1 instructions
  2023-09-18 16:02 ` Dietmar Maurer
@ 2023-09-19 10:06   ` Thomas Lamprecht
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2023-09-19 10:06 UTC (permalink / raw)
  To: Proxmox VE development discussion, Dietmar Maurer, Stefan Hanreich

Am 18/09/2023 um 18:02 schrieb Dietmar Maurer:
>> One of our users ran into issues with running Ceph on older CPU
>> architectures [1]. This is apparently due to a bug in gcc-12 that
>> leads to SSE 4.1 instructions always being executed rather than
>> dynamically dispatching functions using those instructions.
> 
> Cant we fix the GCC bug instead?
> 

And recompile *all* packages to ensure binary compat?

Note that this would need much more investigation to actually see if
it is indeed a bug, or if gf-complete does something weird with their
lots of C and inline assembly that is actually fine (i.e., one of the
millions of undefined behavior of C where the compiler can do whatever),
as we really only got reports for this specific library, if it'd be a
more general bug it would probably affect many more programs.

So, compiling with O1 is by far the quickest and more targeted solution
for us, and  as gf-compat dynamically dispatches vectorized instructions
depending on available support anyway, it has no real performance impact,
at least none we could measure.

FWIW, the Debian openstack team choose the same solution [0] after
coming to no real conclusion on their research into this matter [1].

[0]: https://salsa.debian.org/openstack-team/third-party/gf-complete/-/commit/03e0314af5e814a7ef74dcf4f9416d60c6322e51
[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1012935


So, while yes, we could spend (potentially a lot of) time on
investigating this, with the current information we have, I'd rather
avoid that for now, especially as I have a gut-feeling that this
bug is just a result of gf-complete oddness and C being C, and
I don't want to mess with that more than really needed.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] applied: [PATCH pve-ceph] fix compatibility with CPUs not supporting SSE 4.1 instructions
  2023-09-18 15:46 [pve-devel] [PATCH pve-ceph] fix compatibility with CPUs not supporting SSE 4.1 instructions Stefan Hanreich
  2023-09-18 16:02 ` Dietmar Maurer
@ 2023-10-31  9:52 ` Thomas Lamprecht
  1 sibling, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2023-10-31  9:52 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Hanreich

On 18/09/2023 17:46, Stefan Hanreich wrote:
> One of our users ran into issues with running Ceph on older CPU
> architectures [1]. This is apparently due to a bug in gcc-12 that
> leads to SSE 4.1 instructions always being executed rather than
> dynamically dispatching functions using those instructions. Those
> binaries then break on older CPUs that do not support this instruction
> set.
> 
> I've ran some benchmarks with `rados bench` against our last release
> (18.2.0-pve2) and this new version. The commands were taken from our
> latest Ceph benchmarking paper [2]. The results showed that this patch
> does not lead to performance regressions on newer hardware.
> 
>                   18.2.0-pve2    this patch
> Read EC           4574.28        4651.95
> Write EC          3739.59        3773.87
> Read Replicated   5345.34        5568.41
> Write Replicated  4123.28        4066.19
> (numbers correspond to bandwidth in MB/s)
> 
> [1] https://forum.proxmox.com/threads/proxmox-8-ceph-quincy-monitor-no-longer-working-on-amd-opteron-2427.129613
> [2] https://www.proxmox.com/en/downloads/proxmox-virtual-environment/documentation/proxmox-ve-ceph-benchmark-2020-09
> 
> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
> ---
>  ...y-with-CPUs-not-supporting-SSE-4.1-i.patch | 32 +++++++++++++++++++
>  patches/series                                |  1 +
>  2 files changed, 33 insertions(+)
>  create mode 100644 patches/0015-fix-compatibility-with-CPUs-not-supporting-SSE-4.1-i.patch
> 
>

applied, with a reworded commit message, shifting the blame to the
combination of gf-complete and gcc-12, as the former does some rather
funky stuff too, thanks!

Having this reported to the GCC and/or gf-complete people, ideally
with a reduced example (compiling ceph is a bit overkill ;-)

Using elfx86exts [0] as mentioned in the debian bug [1] should be
enough to ensure your reduced example is still affected and contains
SSE 4.1 instructions.

[0]: https://github.com/pkgw/elfx86exts
[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1012935#10




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-31  9:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18 15:46 [pve-devel] [PATCH pve-ceph] fix compatibility with CPUs not supporting SSE 4.1 instructions Stefan Hanreich
2023-09-18 16:02 ` Dietmar Maurer
2023-09-19 10:06   ` Thomas Lamprecht
2023-10-31  9:52 ` [pve-devel] applied: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal