public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks
@ 2023-01-26  8:32 Friedrich Weber
  2023-01-26  8:32 ` [pve-devel] [RFC manager 1/4] fix #4474: ui: vm stop: ask if active shutdown tasks should be aborted Friedrich Weber
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Friedrich Weber @ 2023-01-26  8:32 UTC (permalink / raw)
  To: pve-devel

As reported in #4474 [0], a user may attempt to shutdown a VM/CT,
realize that it is unresponsive, and decide to stop it instead. If the
shutdown task has not timed out yet, the stop task will fail. The user
needs to manually abort the shutdown task before stopping the VM,
which is inconvenient.

With this patch series, a stop task can optionally overrule active
shutdown tasks.

Summary of changes:

* Backend: We add a new optional `overrule-shutdown` parameter to
  VM/CT `/stop` endpoints. If it is 1, we query for active
  `qmshutdown`/`vzshutdown` tasks by the current user for the given
  VMID, and abort them before attempting to stop the VM/CT.
* Frontend: Before sending a stop command, we check whether there are
  active shutdown tasks. If yes, we ask the user whether they should
  be aborted, and send `overrule-shutdown=1` accordingly.

Tested with a hung CentOS 7 container and a Debian VM where I edited
/etc/systemd/logind.conf to set `HandlePowerKey=ignore`.

Sending this as RFC, to make a first proposal and iterate from there.
My most important questions:

* Does it make sense to have overruling optional? Or should "stop"
  generally overrule shutdown? This might lead to confusing
  interactions, as Thomas noted [0].
* Backend: Is there a more elegant way to overrule shutdown tasks,
  and a better place than pve-guest-common?
* Frontend: When stopping a VM/CT, we already ask for confirmation.
  Is an (occasional) second modal dialog with a lot of text a good user
  experience? Alternatively, I could imagine a checkbox in the first
  dialog saying "Overrule any active shutdown tasks".
* This patch series forbids `overrule-shutdown=1` for HA-managed VMs/CTs
  because I didn't know how overruling should work in a HA setting. Do
  you have any suggestions?

Since this is my first patch with more than a few lines, I'm especially
happy about feedback regarding coding style, naming, anything. :)

[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=4474

pve-manager:

Friedrich Weber (1):
  fix #4474: ui: vm stop: ask if active shutdown tasks should be aborted

 www/manager6/Utils.js        | 27 +++++++++++++++++++++++++++
 www/manager6/dc/Tasks.js     |  2 +-
 www/manager6/lxc/CmdMenu.js  | 14 +++++++++++++-
 www/manager6/lxc/Config.js   |  6 +++++-
 www/manager6/qemu/CmdMenu.js | 14 +++++++++++++-
 www/manager6/qemu/Config.js  |  9 ++++++++-
 6 files changed, 67 insertions(+), 5 deletions(-)

pve-container:

Friedrich Weber (1):
  fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint

 src/PVE/API2/LXC/Status.pm | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

qemu-server:

Friedrich Weber (1):
  fix #4474: qemu api: add overrule-shutdown parameter to stop endpoint

 PVE/API2/Qemu.pm | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

pve-guest-common:

Friedrich Weber (1):
  guest helpers: add helper to overrule active tasks of a specific type

 src/PVE/GuestHelpers.pm | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)


-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pve-devel] [RFC manager 1/4] fix #4474: ui: vm stop: ask if active shutdown tasks should be aborted
  2023-01-26  8:32 [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
@ 2023-01-26  8:32 ` Friedrich Weber
  2023-01-26  8:32 ` [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint Friedrich Weber
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Friedrich Weber @ 2023-01-26  8:32 UTC (permalink / raw)
  To: pve-devel

Before stopping a VM (KVM or LXC), the UI now checks if there is an
active shutdown task for the same VM by the logged-in user. If there
is at least one, the UI asks whether these tasks should be aborted
before trying to stop the VM. If the user answers "Yes", the UI sends
a VM stop request with the `overrule-shutdown` parameter set to 1. If
there are no active shutdown tasks, or the user answers "No", the UI
sends a VM stop request without `overrule-shutdown`.

To avoid an additional API request for querying active shutdown tasks,
we check the UI's current view of cluster tasks instead, which we
fetch from the `pve-cluster-tasks` store.

As the UI might hold an outdated task list, there are some
opportunities for races, e.g., the UI may miss a new shutdown task or
consider a shutdown task active even though it has already terminated.
These races either result in a surviving shutdown task that the user
still needs to abort manually, or a superfluous `override-shutdown=1`
parameter that does not actually abort any tasks. Since "stop
overrules shutdown" is merely a convenience feature, both outcomes
seem bearable.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
 www/manager6/Utils.js        | 27 +++++++++++++++++++++++++++
 www/manager6/dc/Tasks.js     |  2 +-
 www/manager6/lxc/CmdMenu.js  | 14 +++++++++++++-
 www/manager6/lxc/Config.js   |  6 +++++-
 www/manager6/qemu/CmdMenu.js | 14 +++++++++++++-
 www/manager6/qemu/Config.js  |  9 ++++++++-
 6 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/www/manager6/Utils.js b/www/manager6/Utils.js
index 3dd287e3..9e25e647 100644
--- a/www/manager6/Utils.js
+++ b/www/manager6/Utils.js
@@ -1426,6 +1426,33 @@ Ext.define('PVE.Utils', {
 	}
     },
 
+    // If there is an active task of type `taskType` by the current user for the
+    // given VMID, ask whether it should be aborted. Call `callback` accordingly.
+    askForOverruleShutdown: function(guestType, vm, taskType, callback) {
+	const vmidString = vm.vmid.toString();
+	const haEnabled = vm.hastate && vm.hastate !== 'unmanaged';
+	let shutdownTask = Ext.getStore('pve-cluster-tasks')?.findBy(function(task) {
+	    return task.data.user === Proxmox.UserName &&
+		   task.data.id === vmidString &&
+		   task.data.status === undefined &&
+		   task.data.type === taskType;
+	});
+	if (!haEnabled && shutdownTask >= 0) {
+	    const msg = gettext('There are active {0} tasks for {1} {2}. Would you like to abort them before proceeding?');
+	    Ext.Msg.show({
+			 title: gettext('Confirm'),
+			 icon: Ext.Msg.QUESTION,
+			 buttons: Ext.Msg.YESNO,
+			 msg: Ext.String.format(msg, taskType, guestType, vmidString),
+			 callback: function(btn) {
+			     callback(btn === 'yes');
+			 },
+	    });
+	} else {
+	    callback(false);
+	}
+    },
+
     // test automation helper
     call_menu_handler: function(menu, text) {
 	let item = menu.query('menuitem').find(el => el.text === text);
diff --git a/www/manager6/dc/Tasks.js b/www/manager6/dc/Tasks.js
index 5344ede4..2001bf76 100644
--- a/www/manager6/dc/Tasks.js
+++ b/www/manager6/dc/Tasks.js
@@ -11,7 +11,7 @@ Ext.define('PVE.dc.Tasks', {
 	let me = this;
 
 	let taskstore = Ext.create('Proxmox.data.UpdateStore', {
-	    storeid: 'pve-cluster-tasks',
+	    storeId: 'pve-cluster-tasks',
 	    model: 'proxmox-tasks',
 	    proxy: {
 		type: 'proxmox',
diff --git a/www/manager6/lxc/CmdMenu.js b/www/manager6/lxc/CmdMenu.js
index 56f36b5e..01631b1d 100644
--- a/www/manager6/lxc/CmdMenu.js
+++ b/www/manager6/lxc/CmdMenu.js
@@ -66,7 +66,19 @@ Ext.define('PVE.lxc.CmdMenu', {
 		iconCls: 'fa fa-fw fa-stop',
 		disabled: stopped,
 		tooltip: Ext.String.format(gettext('Stop {0} immediately'), 'CT'),
-		handler: () => confirmedVMCommand('stop'),
+		handler: () => {
+		    let msg = Proxmox.Utils.format_task_description('vzstop', info.vmid);
+		    Ext.Msg.confirm(gettext('Confirm'), msg, btn => {
+			if (btn === 'yes') {
+			    PVE.Utils.askForOverruleShutdown('CT', info,
+							     'vzshutdown', overruleShutdown => {
+				const param = overruleShutdown ? { 'overrule-shutdown': 1 }
+								     : undefined;
+				vm_command('stop', param);
+			    });
+			}
+		    });
+		},
 	    },
 	    {
 		text: gettext('Reboot'),
diff --git a/www/manager6/lxc/Config.js b/www/manager6/lxc/Config.js
index 23c17d2e..21d93058 100644
--- a/www/manager6/lxc/Config.js
+++ b/www/manager6/lxc/Config.js
@@ -81,7 +81,11 @@ Ext.define('PVE.lxc.Config', {
 		    tooltip: Ext.String.format(gettext('Stop {0} immediately'), 'CT'),
 		    dangerous: true,
 		    handler: function() {
-			vm_command("stop");
+			PVE.Utils.askForOverruleShutdown('CT', vm,
+							 'vzshutdown', overruleShutdown => {
+			    const param = overruleShutdown ? { 'overrule-shutdown': 1 } : undefined;
+			    vm_command('stop', param);
+			});
 		    },
 		    iconCls: 'fa fa-stop',
 		}],
diff --git a/www/manager6/qemu/CmdMenu.js b/www/manager6/qemu/CmdMenu.js
index ccc5f74d..3edd2550 100644
--- a/www/manager6/qemu/CmdMenu.js
+++ b/www/manager6/qemu/CmdMenu.js
@@ -93,7 +93,19 @@ Ext.define('PVE.qemu.CmdMenu', {
 		iconCls: 'fa fa-fw fa-stop',
 		disabled: stopped,
 		tooltip: Ext.String.format(gettext('Stop {0} immediately'), 'VM'),
-		handler: () => confirmedVMCommand('stop'),
+		handler: () => {
+		    let msg = Proxmox.Utils.format_task_description('qmstop', info.vmid);
+		    Ext.Msg.confirm(gettext('Confirm'), msg, btn => {
+			if (btn === 'yes') {
+			    PVE.Utils.askForOverruleShutdown('VM', info,
+							     'qmshutdown', overruleShutdown => {
+				const param = overruleShutdown ? { 'overrule-shutdown': 1 }
+								     : undefined;
+				vm_command('stop', param);
+			    });
+			}
+		    });
+		},
 	    },
 	    {
 		text: gettext('Reboot'),
diff --git a/www/manager6/qemu/Config.js b/www/manager6/qemu/Config.js
index 94c540c5..68c41454 100644
--- a/www/manager6/qemu/Config.js
+++ b/www/manager6/qemu/Config.js
@@ -180,7 +180,14 @@ Ext.define('PVE.qemu.Config', {
 		    tooltip: Ext.String.format(gettext('Stop {0} immediately'), 'VM'),
 		    confirmMsg: Proxmox.Utils.format_task_description('qmstop', vmid),
 		    handler: function() {
-			vm_command("stop", { timeout: 30 });
+			PVE.Utils.askForOverruleShutdown('VM', vm,
+							 'qmshutdown', overruleShutdown => {
+			    const param = { timeout: 30 };
+			    if (overruleShutdown) {
+				param['overrule-shutdown'] = 1;
+			    }
+			    vm_command('stop', param);
+			});
 		    },
 		    iconCls: 'fa fa-stop',
 		}, {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint
  2023-01-26  8:32 [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
  2023-01-26  8:32 ` [pve-devel] [RFC manager 1/4] fix #4474: ui: vm stop: ask if active shutdown tasks should be aborted Friedrich Weber
@ 2023-01-26  8:32 ` Friedrich Weber
  2023-11-17 13:09   ` Wolfgang Bumiller
  2023-01-26  8:32 ` [pve-devel] [RFC qemu-server 3/4] fix #4474: qemu " Friedrich Weber
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Friedrich Weber @ 2023-01-26  8:32 UTC (permalink / raw)
  To: pve-devel

The new `overrule-shutdown` parameter is boolean and defaults to 0. If
it is 1, all active `vzshutdown` tasks by the current user for the same
CT are aborted before attempting to stop the CT.

Passing `overrule-shutdown=1` is forbidden for HA resources.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
 src/PVE/API2/LXC/Status.pm | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/PVE/API2/LXC/Status.pm b/src/PVE/API2/LXC/Status.pm
index f7e3128..d1d67f4 100644
--- a/src/PVE/API2/LXC/Status.pm
+++ b/src/PVE/API2/LXC/Status.pm
@@ -221,6 +221,12 @@ __PACKAGE__->register_method({
 	    node => get_standard_option('pve-node'),
 	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::LXC::complete_ctid_running }),
 	    skiplock => get_standard_option('skiplock'),
+	    'overrule-shutdown' => {
+		description => "Abort any active 'vzshutdown' task by the current user for this CT before stopping",
+		optional => 1,
+		type => 'boolean',
+		default => 0,
+	    }
 	},
     },
     returns => {
@@ -238,10 +244,15 @@ __PACKAGE__->register_method({
 	raise_param_exc({ skiplock => "Only root may use this option." })
 	    if $skiplock && $authuser ne 'root@pam';
 
+	my $overrule_shutdown = extract_param($param, 'overrule-shutdown');
+
 	die "CT $vmid not running\n" if !PVE::LXC::check_running($vmid);
 
 	if (PVE::HA::Config::vm_is_ha_managed($vmid) && $rpcenv->{type} ne 'ha') {
 
+	    raise_param_exc({ 'overrule-shutdown' => "Not applicable for HA resources." })
+		if $overrule_shutdown;
+
 	    my $hacmd = sub {
 		my $upid = shift;
 
@@ -272,6 +283,11 @@ __PACKAGE__->register_method({
 		return $rpcenv->fork_worker('vzstop', $vmid, $authuser, $realcmd);
 	    };
 
+	    if ($overrule_shutdown) {
+		my $overruled_tasks = PVE::GuestHelpers::overrule_tasks('vzshutdown', $authuser, $vmid);
+		syslog('info', "overruled vzshutdown tasks: " . join(", ", $overruled_tasks->@*) . "\n");
+	    };
+
 	    return PVE::LXC::Config->lock_config($vmid, $lockcmd);
 	}
     }});
-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pve-devel] [RFC qemu-server 3/4] fix #4474: qemu api: add overrule-shutdown parameter to stop endpoint
  2023-01-26  8:32 [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
  2023-01-26  8:32 ` [pve-devel] [RFC manager 1/4] fix #4474: ui: vm stop: ask if active shutdown tasks should be aborted Friedrich Weber
  2023-01-26  8:32 ` [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint Friedrich Weber
@ 2023-01-26  8:32 ` Friedrich Weber
  2023-11-17 13:12   ` Wolfgang Bumiller
  2023-01-26  8:32 ` [pve-devel] [RFC guest-common 4/4] guest helpers: add helper to overrule active tasks of a specific type Friedrich Weber
  2023-09-27  9:04 ` [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
  4 siblings, 1 reply; 14+ messages in thread
From: Friedrich Weber @ 2023-01-26  8:32 UTC (permalink / raw)
  To: pve-devel

The new `overrule-shutdown` parameter is boolean and defaults to 0. If
it is 1, all active `qmshutdown` tasks by the current user for the same
VM are aborted before attempting to stop the VM.

Passing `overrule-shutdown=1` is forbidden for HA resources.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
 PVE/API2/Qemu.pm | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index c87602d..b253e1f 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -2799,7 +2799,13 @@ __PACKAGE__->register_method({
 		type => 'boolean',
 		optional => 1,
 		default => 0,
-	    }
+	    },
+	    'overrule-shutdown' => {
+		description => "Abort any active 'qmshutdown' task by the current user for this VM before stopping",
+		optional => 1,
+		type => 'boolean',
+		default => 0,
+	    },
 	},
     },
     returns => {
@@ -2826,10 +2832,13 @@ __PACKAGE__->register_method({
 	raise_param_exc({ migratedfrom => "Only root may use this option." })
 	    if $migratedfrom && $authuser ne 'root@pam';
 
+	my $overrule_shutdown = extract_param($param, 'overrule-shutdown');
 
 	my $storecfg = PVE::Storage::config();
 
 	if (PVE::HA::Config::vm_is_ha_managed($vmid) && ($rpcenv->{type} ne 'ha') && !defined($migratedfrom)) {
+	    raise_param_exc({ 'overrule-shutdown' => "Not applicable for HA resources." })
+		if $overrule_shutdown;
 
 	    my $hacmd = sub {
 		my $upid = shift;
@@ -2849,6 +2858,11 @@ __PACKAGE__->register_method({
 
 		syslog('info', "stop VM $vmid: $upid\n");
 
+		if ($overrule_shutdown) {
+		    my $overruled_tasks = PVE::GuestHelpers::overrule_tasks('qmshutdown', $authuser, $vmid);
+		    print "overruled qmshutdown tasks: " . join(", ", $overruled_tasks->@*) . "\n";
+		};
+
 		PVE::QemuServer::vm_stop($storecfg, $vmid, $skiplock, 0,
 					 $param->{timeout}, 0, 1, $keepActive, $migratedfrom);
 		return;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pve-devel] [RFC guest-common 4/4] guest helpers: add helper to overrule active tasks of a specific type
  2023-01-26  8:32 [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
                   ` (2 preceding siblings ...)
  2023-01-26  8:32 ` [pve-devel] [RFC qemu-server 3/4] fix #4474: qemu " Friedrich Weber
@ 2023-01-26  8:32 ` Friedrich Weber
  2023-11-17 12:53   ` Wolfgang Bumiller
  2023-09-27  9:04 ` [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
  4 siblings, 1 reply; 14+ messages in thread
From: Friedrich Weber @ 2023-01-26  8:32 UTC (permalink / raw)
  To: pve-devel

This helper is used to abort any active qmshutdown/vzshutdown tasks
before attempting to stop a VM/CT (if requested).

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
 src/PVE/GuestHelpers.pm | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/PVE/GuestHelpers.pm b/src/PVE/GuestHelpers.pm
index b4ccbaa..3cdf5d7 100644
--- a/src/PVE/GuestHelpers.pm
+++ b/src/PVE/GuestHelpers.pm
@@ -366,4 +366,22 @@ sub get_unique_tags {
     return !$no_join_result ? join(';', $res->@*) : $res;
 }
 
+sub overrule_tasks {
+    my ($type, $user, $vmid) = @_;
+
+    my $active_tasks = PVE::INotify::read_file('active');
+    my $res = [];
+    for my $task (@$active_tasks) {
+	if (!$task->{saved}
+	    && $task->{type} eq $type
+	    && $task->{user} eq $user
+	    && $task->{id} eq $vmid
+	) {
+	    PVE::RPCEnvironment->check_worker($task->{upid}, 1);
+	    push @$res, $task->{upid};
+	}
+    }
+    return $res;
+}
+
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks
  2023-01-26  8:32 [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
                   ` (3 preceding siblings ...)
  2023-01-26  8:32 ` [pve-devel] [RFC guest-common 4/4] guest helpers: add helper to overrule active tasks of a specific type Friedrich Weber
@ 2023-09-27  9:04 ` Friedrich Weber
  2023-11-17 12:31   ` Wolfgang Bumiller
  4 siblings, 1 reply; 14+ messages in thread
From: Friedrich Weber @ 2023-09-27  9:04 UTC (permalink / raw)
  To: pve-devel

Lost track of this a bit, reviving due to user interest [1].

As the series does not apply anymore, I'll send a new version in any
case, but wanted to ask for feedback before I do.

My questions from the cover letter still apply:

On 26/01/2023 09:32, Friedrich Weber wrote:
> * Does it make sense to have overruling optional? Or should "stop"
>   generally overrule shutdown? This might lead to confusing
>   interactions, as Thomas noted [0].
> * Backend: Is there a more elegant way to overrule shutdown tasks,
>   and a better place than pve-guest-common?
> * Frontend: When stopping a VM/CT, we already ask for confirmation.
>   Is an (occasional) second modal dialog with a lot of text a good user
>   experience? Alternatively, I could imagine a checkbox in the first
>   dialog saying "Overrule any active shutdown tasks".

Actually I don't really like the second modal dialog. What about the
following: When the user clicks "Stop" and the frontend detects an
active shutdown task, the already-existing "Confirm" dialog has an
additional default-off checkbox "Kill active shutdown tasks" (or
similar). This way the default behavior does not change, but users do
not have to kill active shutdown tasks manually anymore.

> * This patch series forbids `overrule-shutdown=1` for HA-managed VMs/CTs
>   because I didn't know how overruling should work in a HA setting. Do
>   you have any suggestions?
> 
> Since this is my first patch with more than a few lines, I'm especially
> happy about feedback regarding coding style, naming, anything. :)

[1] https://forum.proxmox.com/threads/16235/post-590240




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks
  2023-09-27  9:04 ` [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
@ 2023-11-17 12:31   ` Wolfgang Bumiller
  2023-12-01  9:57     ` Friedrich Weber
  0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Bumiller @ 2023-11-17 12:31 UTC (permalink / raw)
  To: Friedrich Weber; +Cc: pve-devel

On Wed, Sep 27, 2023 at 11:04:26AM +0200, Friedrich Weber wrote:
> Lost track of this a bit, reviving due to user interest [1].
> 
> As the series does not apply anymore, I'll send a new version in any
> case, but wanted to ask for feedback before I do.
> 
> My questions from the cover letter still apply:
> 
> On 26/01/2023 09:32, Friedrich Weber wrote:
> > * Does it make sense to have overruling optional? Or should "stop"
> >   generally overrule shutdown? This might lead to confusing
> >   interactions, as Thomas noted [0].

Although whenever I ran into that I had simply misclicked shutdown or
became impatient. I never had any automated shutdown tasks happen.
Yet I still feel like this should be optional ;-)
(I usually just ended up using `qm stop` on the cli :P)

> > * Backend: Is there a more elegant way to overrule shutdown tasks,
> >   and a better place than pve-guest-common?
> > * Frontend: When stopping a VM/CT, we already ask for confirmation.
> >   Is an (occasional) second modal dialog with a lot of text a good user
> >   experience? Alternatively, I could imagine a checkbox in the first
> >   dialog saying "Overrule any active shutdown tasks".
> 
> Actually I don't really like the second modal dialog. What about the
> following: When the user clicks "Stop" and the frontend detects an
> active shutdown task, the already-existing "Confirm" dialog has an
> additional default-off checkbox "Kill active shutdown tasks" (or
> similar). This way the default behavior does not change, but users do
> not have to kill active shutdown tasks manually anymore.

Sounds good to me.
But maybe don't use the word "kill" 😄 "Replace/Override" should work.

> 
> > * This patch series forbids `overrule-shutdown=1` for HA-managed VMs/CTs
> >   because I didn't know how overruling should work in a HA setting. Do
> >   you have any suggestions?

I think it's okay to disable this for now.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC guest-common 4/4] guest helpers: add helper to overrule active tasks of a specific type
  2023-01-26  8:32 ` [pve-devel] [RFC guest-common 4/4] guest helpers: add helper to overrule active tasks of a specific type Friedrich Weber
@ 2023-11-17 12:53   ` Wolfgang Bumiller
  2023-12-01  9:57     ` Friedrich Weber
  0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Bumiller @ 2023-11-17 12:53 UTC (permalink / raw)
  To: Friedrich Weber; +Cc: pve-devel

Patch itself LGTM, just a note on sending patch series in general:

If you number patches throughout a whole series rather than the
individual repositories (as in, this one is labeled 4/4 instead of 1/1),
it would be nice if the order also helps determine dependencies.

Since the sub introduced here is used in 2/4 and 3/4, it doesn't make
sense for this to come last.
`qemu-server` and `container` will both want their guest-common
dependency bumped to the version which includes this patch.

On Thu, Jan 26, 2023 at 09:32:14AM +0100, Friedrich Weber wrote:
> This helper is used to abort any active qmshutdown/vzshutdown tasks
> before attempting to stop a VM/CT (if requested).
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
>  src/PVE/GuestHelpers.pm | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/src/PVE/GuestHelpers.pm b/src/PVE/GuestHelpers.pm
> index b4ccbaa..3cdf5d7 100644
> --- a/src/PVE/GuestHelpers.pm
> +++ b/src/PVE/GuestHelpers.pm
> @@ -366,4 +366,22 @@ sub get_unique_tags {
>      return !$no_join_result ? join(';', $res->@*) : $res;
>  }
>  
> +sub overrule_tasks {
> +    my ($type, $user, $vmid) = @_;
> +
> +    my $active_tasks = PVE::INotify::read_file('active');
> +    my $res = [];
> +    for my $task (@$active_tasks) {
> +	if (!$task->{saved}
> +	    && $task->{type} eq $type
> +	    && $task->{user} eq $user
> +	    && $task->{id} eq $vmid
> +	) {
> +	    PVE::RPCEnvironment->check_worker($task->{upid}, 1);
> +	    push @$res, $task->{upid};
> +	}
> +    }
> +    return $res;
> +}
> +
>  1;
> -- 
> 2.30.2




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint
  2023-01-26  8:32 ` [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint Friedrich Weber
@ 2023-11-17 13:09   ` Wolfgang Bumiller
  2023-12-01  9:57     ` Friedrich Weber
  0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Bumiller @ 2023-11-17 13:09 UTC (permalink / raw)
  To: Friedrich Weber; +Cc: pve-devel

On Thu, Jan 26, 2023 at 09:32:12AM +0100, Friedrich Weber wrote:
> The new `overrule-shutdown` parameter is boolean and defaults to 0. If
> it is 1, all active `vzshutdown` tasks by the current user for the same
> CT are aborted before attempting to stop the CT.
> 
> Passing `overrule-shutdown=1` is forbidden for HA resources.
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
>  src/PVE/API2/LXC/Status.pm | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/src/PVE/API2/LXC/Status.pm b/src/PVE/API2/LXC/Status.pm
> index f7e3128..d1d67f4 100644
> --- a/src/PVE/API2/LXC/Status.pm
> +++ b/src/PVE/API2/LXC/Status.pm
> @@ -221,6 +221,12 @@ __PACKAGE__->register_method({
>  	    node => get_standard_option('pve-node'),
>  	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::LXC::complete_ctid_running }),
>  	    skiplock => get_standard_option('skiplock'),
> +	    'overrule-shutdown' => {
> +		description => "Abort any active 'vzshutdown' task by the current user for this CT before stopping",
> +		optional => 1,
> +		type => 'boolean',
> +		default => 0,
> +	    }
>  	},
>      },
>      returns => {
> @@ -238,10 +244,15 @@ __PACKAGE__->register_method({
>  	raise_param_exc({ skiplock => "Only root may use this option." })
>  	    if $skiplock && $authuser ne 'root@pam';
>  
> +	my $overrule_shutdown = extract_param($param, 'overrule-shutdown');
> +
>  	die "CT $vmid not running\n" if !PVE::LXC::check_running($vmid);
>  
>  	if (PVE::HA::Config::vm_is_ha_managed($vmid) && $rpcenv->{type} ne 'ha') {
>  
> +	    raise_param_exc({ 'overrule-shutdown' => "Not applicable for HA resources." })
> +		if $overrule_shutdown;
> +
>  	    my $hacmd = sub {
>  		my $upid = shift;
>  
> @@ -272,6 +283,11 @@ __PACKAGE__->register_method({
>  		return $rpcenv->fork_worker('vzstop', $vmid, $authuser, $realcmd);
>  	    };
>  
> +	    if ($overrule_shutdown) {
> +		my $overruled_tasks = PVE::GuestHelpers::overrule_tasks('vzshutdown', $authuser, $vmid);
> +		syslog('info', "overruled vzshutdown tasks: " . join(", ", $overruled_tasks->@*) . "\n");
> +	    };
> +

^ So this part is fine (mostly¹)

>  	    return PVE::LXC::Config->lock_config($vmid, $lockcmd);

^ Here we lock first, then fork the worker, then do `vm_stop` with the
config lock inherited.

This means that creating multiple shutdown tasks before using one with
override=true could cause the override task to cancel the *first* ongoing
shutdown task, then move on to the `lock_config` call - in the meantime
a second shutdown task acquires this very lock and performs another
long-running shutdown, causing the `override` parameter to be
ineffective.

We should switch the ordering here: first fork the worker, then lock.
(¹ And your new chunk would go into the worker as well)

Unless I'm missing something, but AFAICT the current ordering there is
rather ... bad :-)




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC qemu-server 3/4] fix #4474: qemu api: add overrule-shutdown parameter to stop endpoint
  2023-01-26  8:32 ` [pve-devel] [RFC qemu-server 3/4] fix #4474: qemu " Friedrich Weber
@ 2023-11-17 13:12   ` Wolfgang Bumiller
  0 siblings, 0 replies; 14+ messages in thread
From: Wolfgang Bumiller @ 2023-11-17 13:12 UTC (permalink / raw)
  To: Friedrich Weber; +Cc: pve-devel

This one LGTM.

On Thu, Jan 26, 2023 at 09:32:13AM +0100, Friedrich Weber wrote:
> The new `overrule-shutdown` parameter is boolean and defaults to 0. If
> it is 1, all active `qmshutdown` tasks by the current user for the same
> VM are aborted before attempting to stop the VM.
> 
> Passing `overrule-shutdown=1` is forbidden for HA resources.
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>

Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>

> ---
>  PVE/API2/Qemu.pm | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index c87602d..b253e1f 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -2799,7 +2799,13 @@ __PACKAGE__->register_method({
>  		type => 'boolean',
>  		optional => 1,
>  		default => 0,
> -	    }
> +	    },
> +	    'overrule-shutdown' => {
> +		description => "Abort any active 'qmshutdown' task by the current user for this VM before stopping",
> +		optional => 1,
> +		type => 'boolean',
> +		default => 0,
> +	    },
>  	},
>      },
>      returns => {
> @@ -2826,10 +2832,13 @@ __PACKAGE__->register_method({
>  	raise_param_exc({ migratedfrom => "Only root may use this option." })
>  	    if $migratedfrom && $authuser ne 'root@pam';
>  
> +	my $overrule_shutdown = extract_param($param, 'overrule-shutdown');
>  
>  	my $storecfg = PVE::Storage::config();
>  
>  	if (PVE::HA::Config::vm_is_ha_managed($vmid) && ($rpcenv->{type} ne 'ha') && !defined($migratedfrom)) {
> +	    raise_param_exc({ 'overrule-shutdown' => "Not applicable for HA resources." })
> +		if $overrule_shutdown;
>  
>  	    my $hacmd = sub {
>  		my $upid = shift;
> @@ -2849,6 +2858,11 @@ __PACKAGE__->register_method({
>  
>  		syslog('info', "stop VM $vmid: $upid\n");
>  
> +		if ($overrule_shutdown) {
> +		    my $overruled_tasks = PVE::GuestHelpers::overrule_tasks('qmshutdown', $authuser, $vmid);
> +		    print "overruled qmshutdown tasks: " . join(", ", $overruled_tasks->@*) . "\n";
> +		};
> +
>  		PVE::QemuServer::vm_stop($storecfg, $vmid, $skiplock, 0,
>  					 $param->{timeout}, 0, 1, $keepActive, $migratedfrom);
>  		return;
> -- 
> 2.30.2




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC guest-common 4/4] guest helpers: add helper to overrule active tasks of a specific type
  2023-11-17 12:53   ` Wolfgang Bumiller
@ 2023-12-01  9:57     ` Friedrich Weber
  0 siblings, 0 replies; 14+ messages in thread
From: Friedrich Weber @ 2023-12-01  9:57 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pve-devel

On 17/11/2023 13:53, Wolfgang Bumiller wrote:
> Patch itself LGTM, just a note on sending patch series in general:
> 
> If you number patches throughout a whole series rather than the
> individual repositories (as in, this one is labeled 4/4 instead of 1/1),
> it would be nice if the order also helps determine dependencies.

Makes total sense, I'll keep this in mind for the future.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks
  2023-11-17 12:31   ` Wolfgang Bumiller
@ 2023-12-01  9:57     ` Friedrich Weber
  0 siblings, 0 replies; 14+ messages in thread
From: Friedrich Weber @ 2023-12-01  9:57 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pve-devel

Thanks for the review! I'll prepare a v2 that incorporates the UI
changes I suggested earlier. I do have some questions regarding the
concurrent tasks scenario in patch #2, see my separate mail.

On 17/11/2023 13:31, Wolfgang Bumiller wrote:
[...]
>> On 26/01/2023 09:32, Friedrich Weber wrote:
>>> * Does it make sense to have overruling optional? Or should "stop"
>>>   generally overrule shutdown? This might lead to confusing
>>>   interactions, as Thomas noted [0].
> 
> Although whenever I ran into that I had simply misclicked shutdown or
> became impatient. I never had any automated shutdown tasks happen.
> Yet I still feel like this should be optional ;-)
> (I usually just ended up using `qm stop` on the cli :P)

Makes sense! (And using `qm stop` sounds like it might save some clicks ...)

>>> * Backend: Is there a more elegant way to overrule shutdown tasks,
>>>   and a better place than pve-guest-common?
>>> * Frontend: When stopping a VM/CT, we already ask for confirmation.
>>>   Is an (occasional) second modal dialog with a lot of text a good user
>>>   experience? Alternatively, I could imagine a checkbox in the first
>>>   dialog saying "Overrule any active shutdown tasks".
>>
>> Actually I don't really like the second modal dialog. What about the
>> following: When the user clicks "Stop" and the frontend detects an
>> active shutdown task, the already-existing "Confirm" dialog has an
>> additional default-off checkbox "Kill active shutdown tasks" (or
>> similar). This way the default behavior does not change, but users do
>> not have to kill active shutdown tasks manually anymore.
> 
> Sounds good to me.
> But maybe don't use the word "kill" 😄 "Replace/Override" should work.

Fair point. :)

>>> * This patch series forbids `overrule-shutdown=1` for HA-managed VMs/CTs
>>>   because I didn't know how overruling should work in a HA setting. Do
>>>   you have any suggestions?
> 
> I think it's okay to disable this for now.

Alright!




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint
  2023-11-17 13:09   ` Wolfgang Bumiller
@ 2023-12-01  9:57     ` Friedrich Weber
  2024-01-02 13:34       ` Friedrich Weber
  0 siblings, 1 reply; 14+ messages in thread
From: Friedrich Weber @ 2023-12-01  9:57 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pve-devel

Thanks for looking into this!

On 17/11/2023 14:09, Wolfgang Bumiller wrote:
[...]
>>  	    return PVE::LXC::Config->lock_config($vmid, $lockcmd);
> 
> ^ Here we lock first, then fork the worker, then do `vm_stop` with the
> config lock inherited.
> 
> This means that creating multiple shutdown tasks before using one with
> override=true could cause the override task to cancel the *first* ongoing
> shutdown task, then move on to the `lock_config` call - in the meantime
> a second shutdown task acquires this very lock and performs another
> long-running shutdown, causing the `override` parameter to be
> ineffective.

Just to make sure I understand correctly, the scenario is (please
correct me if I'm wrong):

* shutdown task #1 has the lock and starts long-running shutdown
* stop API handler with override kills shutdown task #1, but does not
acquire the lock yet
* shutdown task #2 starts, acquires the lock and starts long-running
shutdown
* stop task waits for the lock => override flag was ineffective

> We should switch the ordering here: first fork the worker, then lock.
> (¹ And your new chunk would go into the worker as well)
> 
> Unless I'm missing something, but AFAICT the current ordering there is
> rather ... bad :-)

Would this actually prevent the scenario above? We cannot put my new
chunk into the locked section (because then it couldn't kill an active
shutdown task that has the lock), but if we put it into the worker
before the locked section, couldn't the same thing as above happen?
Meaning the stop task with override kills shutdown tasks but doesn't
have the lock yet, a new shutdown task acquires the lock, makes the stop
task wait for it, and renders the override flag ineffective just the same?




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint
  2023-12-01  9:57     ` Friedrich Weber
@ 2024-01-02 13:34       ` Friedrich Weber
  0 siblings, 0 replies; 14+ messages in thread
From: Friedrich Weber @ 2024-01-02 13:34 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pve-devel

On 01/12/2023 10:57, Friedrich Weber wrote:
> On 17/11/2023 14:09, Wolfgang Bumiller wrote:
> [...]
>>>  	    return PVE::LXC::Config->lock_config($vmid, $lockcmd);
>>
>> ^ Here we lock first, then fork the worker, then do `vm_stop` with the
>> config lock inherited.
>>
>> This means that creating multiple shutdown tasks before using one with
>> override=true could cause the override task to cancel the *first* ongoing
>> shutdown task, then move on to the `lock_config` call - in the meantime
>> a second shutdown task acquires this very lock and performs another
>> long-running shutdown, causing the `override` parameter to be
>> ineffective.
> 
> Just to make sure I understand correctly, the scenario is (please
> correct me if I'm wrong):
> 
> * shutdown task #1 has the lock and starts long-running shutdown
> * stop API handler with override kills shutdown task #1, but does not
> acquire the lock yet
> * shutdown task #2 starts, acquires the lock and starts long-running
> shutdown
> * stop task waits for the lock => override flag was ineffective

Discussed this with Wolfgang off-list, posting here for completeness. I
suppose the scenario I sketched is technically possible, but unlikely to
occur in practice (the stop API handler will usually acquire the lock
before shutdown task #2 can).

Wolfgang actually sketched a slightly different scenario, which is
reproducible with containers pretty easily:

* shutdown task #1 has the lock and starts long-running shutdown
* API handler for shutdown task #2 waits for the lock (there is no task yet)
* API handler for stop task #3 (with overrule-shutdown) kills shutdown
task #1, but does not acquire the lock yet
* API handler for shutdown task #2 acquires the lock and runs another
long-running shutdown
* API handler for stop task #3 waits for the lock => overrule-shutdown
flag was ineffective

As pointed out by Wolfgang this happens because container shutdown
currently uses lock-then-fork. VM shutdown, on the other hand, uses
fork-then-lock, so the above can't happen (the stop task with
overrule-shutdown kills both shutdown tasks).

In the next version I'll send a separate patch that switches the
ordering as suggested by Wolfgang.




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-01-02 13:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-26  8:32 [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
2023-01-26  8:32 ` [pve-devel] [RFC manager 1/4] fix #4474: ui: vm stop: ask if active shutdown tasks should be aborted Friedrich Weber
2023-01-26  8:32 ` [pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint Friedrich Weber
2023-11-17 13:09   ` Wolfgang Bumiller
2023-12-01  9:57     ` Friedrich Weber
2024-01-02 13:34       ` Friedrich Weber
2023-01-26  8:32 ` [pve-devel] [RFC qemu-server 3/4] fix #4474: qemu " Friedrich Weber
2023-11-17 13:12   ` Wolfgang Bumiller
2023-01-26  8:32 ` [pve-devel] [RFC guest-common 4/4] guest helpers: add helper to overrule active tasks of a specific type Friedrich Weber
2023-11-17 12:53   ` Wolfgang Bumiller
2023-12-01  9:57     ` Friedrich Weber
2023-09-27  9:04 ` [pve-devel] [RFC manager/container/qemu-server/guest-common 0/4] fix #4474: stop tasks may overrule shutdown tasks Friedrich Weber
2023-11-17 12:31   ` Wolfgang Bumiller
2023-12-01  9:57     ` Friedrich Weber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal