public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing
@ 2024-10-30  9:52 Fiona Ebner
  2024-11-08 15:18 ` Fiona Ebner
  2024-11-10 10:24 ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 2 replies; 3+ messages in thread
From: Fiona Ebner @ 2024-10-30  9:52 UTC (permalink / raw)
  To: pve-devel

In the community forum, users reported issues about RCU stalls and
sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
was also experiencing similar issues from time to time and recently,
obtained a GDB stacktrace. The stacktrace showed that, in his case,
the vCPU threads were waiting in cpu_throttle_thread(). It is a good
guess that the issues in the forum could also be because of that.

From searching in the source code, it seems that migration is the only
user of the vCPU throttling functions in QEMU relevant for Proxmox VE
(the only other place where it is used is the Cocoa UI). In
particular, RAM migration will begin throttling vCPUs for
auto-converge.

In migration_iteration_finish() there is an unconditional call to
cpu_throttle_stop(), so do the same in the async snapshot code
specific to Proxmox VE.

It's not clear why the issue began to surface more prominently only
now, since the vCPU throttling was there since commit 070afca258
("migration: Dynamic cpu throttling for auto-converge") in QEMU
v2.10.0. However, there were a lot of changes in the migration code
between v8.1.5 and v9.0.2 and a few of them might have affected the
likelihood of cpu_throttle_set() being called, for example, 4e1871c450
("migration: Don't serialize devices in qemu_savevm_state_iterate()")

[0]: https://forum.proxmox.com/threads/153483

Reported-by: Mario Loderer <m.loderer@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Mario Loderer <m.loderer@proxmox.com>
---
 ...-async-for-background-state-snapshots.patch | 18 +++++++++++++-----
 ...-add-optional-buffer-size-to-QEMUFile.patch |  6 +++---
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch
index 964caf3..b0e75e9 100644
--- a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch
+++ b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch
@@ -28,7 +28,8 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
      adapt to removal of QEMUFileOps
      improve condition for entering final stage
      adapt to QAPI and other changes for 8.2
-     make sure to not call vm_start() from coroutine]
+     make sure to not call vm_start() from coroutine
+     stop CPU throttling after finishing]
 Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
 ---
  hmp-commands-info.hx         |  13 +
@@ -36,13 +37,13 @@ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
  include/migration/snapshot.h |   2 +
  include/monitor/hmp.h        |   3 +
  migration/meson.build        |   1 +
- migration/savevm-async.c     | 538 +++++++++++++++++++++++++++++++++++
+ migration/savevm-async.c     | 545 +++++++++++++++++++++++++++++++++++
  monitor/hmp-cmds.c           |  38 +++
  qapi/migration.json          |  34 +++
  qapi/misc.json               |  18 ++
  qemu-options.hx              |  12 +
  system/vl.c                  |  10 +
- 11 files changed, 686 insertions(+)
+ 11 files changed, 693 insertions(+)
  create mode 100644 migration/savevm-async.c
 
 diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
@@ -140,10 +141,10 @@ index 95d1cf2250..800f12a60d 100644
    'threadinfo.c',
 diff --git a/migration/savevm-async.c b/migration/savevm-async.c
 new file mode 100644
-index 0000000000..72cf6588c2
+index 0000000000..1af32604c7
 --- /dev/null
 +++ b/migration/savevm-async.c
-@@ -0,0 +1,538 @@
+@@ -0,0 +1,545 @@
 +#include "qemu/osdep.h"
 +#include "migration/channel-savevm-async.h"
 +#include "migration/migration.h"
@@ -154,6 +155,7 @@ index 0000000000..72cf6588c2
 +#include "migration/global_state.h"
 +#include "migration/ram.h"
 +#include "migration/qemu-file.h"
++#include "sysemu/cpu-throttle.h"
 +#include "sysemu/sysemu.h"
 +#include "sysemu/runstate.h"
 +#include "block/block.h"
@@ -342,6 +344,12 @@ index 0000000000..72cf6588c2
 +        ret || aborted ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED);
 +    ms->to_dst_file = NULL;
 +
++    /*
++     * Same as in migration_iteration_finish(): saving RAM might've turned on CPU throttling for
++     * auto-converge, make sure to disable it.
++     */
++    cpu_throttle_stop();
++
 +    qemu_savevm_state_cleanup();
 +
 +    ret = save_snapshot_cleanup();
diff --git a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch
index 043b249..92bc9f2 100644
--- a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch
+++ b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch
@@ -193,10 +193,10 @@ index 32fd4a34fd..36a0cd8cc8 100644
  
  /*
 diff --git a/migration/savevm-async.c b/migration/savevm-async.c
-index 72cf6588c2..fb4e8ea689 100644
+index 1af32604c7..be2035cd2e 100644
 --- a/migration/savevm-async.c
 +++ b/migration/savevm-async.c
-@@ -379,7 +379,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
+@@ -386,7 +386,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
  
      QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
                                                                 &snap_state.bs_pos));
@@ -205,7 +205,7 @@ index 72cf6588c2..fb4e8ea689 100644
  
      if (!snap_state.file) {
          error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
-@@ -503,7 +503,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
+@@ -510,7 +510,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
      blk_op_block_all(be, blocker);
  
      /* restore the VM state */
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing
  2024-10-30  9:52 [pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing Fiona Ebner
@ 2024-11-08 15:18 ` Fiona Ebner
  2024-11-10 10:24 ` [pve-devel] applied: " Thomas Lamprecht
  1 sibling, 0 replies; 3+ messages in thread
From: Fiona Ebner @ 2024-11-08 15:18 UTC (permalink / raw)
  To: pve-devel

Ping, as there are still new affected users showing up in the forum thread.

Am 30.10.24 um 10:52 schrieb Fiona Ebner:
> In the community forum, users reported issues about RCU stalls and
> sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
> was also experiencing similar issues from time to time and recently,
> obtained a GDB stacktrace. The stacktrace showed that, in his case,
> the vCPU threads were waiting in cpu_throttle_thread(). It is a good
> guess that the issues in the forum could also be because of that.
> 
> From searching in the source code, it seems that migration is the only
> user of the vCPU throttling functions in QEMU relevant for Proxmox VE
> (the only other place where it is used is the Cocoa UI). In
> particular, RAM migration will begin throttling vCPUs for
> auto-converge.
> 
> In migration_iteration_finish() there is an unconditional call to
> cpu_throttle_stop(), so do the same in the async snapshot code
> specific to Proxmox VE.
> 
> It's not clear why the issue began to surface more prominently only
> now, since the vCPU throttling was there since commit 070afca258
> ("migration: Dynamic cpu throttling for auto-converge") in QEMU
> v2.10.0. However, there were a lot of changes in the migration code
> between v8.1.5 and v9.0.2 and a few of them might have affected the
> likelihood of cpu_throttle_set() being called, for example, 4e1871c450
> ("migration: Don't serialize devices in qemu_savevm_state_iterate()")
> 
> [0]: https://forum.proxmox.com/threads/153483
> 
> Reported-by: Mario Loderer <m.loderer@proxmox.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> Tested-by: Mario Loderer <m.loderer@proxmox.com>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pve-devel] applied: [PATCH qemu] async snapshot: stop vCPU throttling after finishing
  2024-10-30  9:52 [pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing Fiona Ebner
  2024-11-08 15:18 ` Fiona Ebner
@ 2024-11-10 10:24 ` Thomas Lamprecht
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Lamprecht @ 2024-11-10 10:24 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 30.10.24 um 10:52 schrieb Fiona Ebner:
> In the community forum, users reported issues about RCU stalls and
> sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
> was also experiencing similar issues from time to time and recently,
> obtained a GDB stacktrace. The stacktrace showed that, in his case,
> the vCPU threads were waiting in cpu_throttle_thread(). It is a good
> guess that the issues in the forum could also be because of that.
> 
> From searching in the source code, it seems that migration is the only
> user of the vCPU throttling functions in QEMU relevant for Proxmox VE
> (the only other place where it is used is the Cocoa UI). In
> particular, RAM migration will begin throttling vCPUs for
> auto-converge.
> 
> In migration_iteration_finish() there is an unconditional call to
> cpu_throttle_stop(), so do the same in the async snapshot code
> specific to Proxmox VE.
> 
> It's not clear why the issue began to surface more prominently only
> now, since the vCPU throttling was there since commit 070afca258
> ("migration: Dynamic cpu throttling for auto-converge") in QEMU
> v2.10.0. However, there were a lot of changes in the migration code
> between v8.1.5 and v9.0.2 and a few of them might have affected the
> likelihood of cpu_throttle_set() being called, for example, 4e1871c450
> ("migration: Don't serialize devices in qemu_savevm_state_iterate()")
> 
> [0]: https://forum.proxmox.com/threads/153483
> 
> Reported-by: Mario Loderer <m.loderer@proxmox.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> Tested-by: Mario Loderer <m.loderer@proxmox.com>
> ---
>  ...-async-for-background-state-snapshots.patch | 18 +++++++++++++-----
>  ...-add-optional-buffer-size-to-QEMUFile.patch |  6 +++---
>  2 files changed, 16 insertions(+), 8 deletions(-)
> 
>

applied, thanks!


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-11-10 10:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-30  9:52 [pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing Fiona Ebner
2024-11-08 15:18 ` Fiona Ebner
2024-11-10 10:24 ` [pve-devel] applied: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal