* [pve-devel] PVE child process behavior question @ 2025-05-21 13:13 Denis Kanchev via pve-devel 2025-05-22 6:30 ` Fabian Grünbichler 0 siblings, 1 reply; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-05-21 13:13 UTC (permalink / raw) To: pve-devel; +Cc: Denis Kanchev [-- Attachment #1: Type: message/rfc822, Size: 6622 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: pve-devel@lists.proxmox.com Subject: PVE child process behavior question Date: Wed, 21 May 2025 16:13:01 +0300 Message-ID: <8a172f9c-3927-4bff-a2c8-01184098e506@storpool.com> Hello, We had an issue with a customer migrating a VM between nodes using our shared storage solution. On the target host the OOM killer killed the main migration process, but the child process (which actually performs the migration) kept on working, which we did not expect, and that caused some issues. This leads us to the broader question - after a request is submitted, the parent can be terminated, and not return a response to the client, while the work is being done, and the request can be wrongly retried or considered unfinished. Should the child processes terminate together with the parent to guard against this, or is this expected behavior? Here is an example patch to do this: diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm index bfde7e6..744fffc 100644 --- a/src/PVE/RESTEnvironment.pm +++ b/src/PVE/RESTEnvironment.pm @@ -13,8 +13,9 @@ use Fcntl qw(:flock); use IO::File; use IO::Handle; use IO::Select; -use POSIX qw(:sys_wait_h EINTR); +use POSIX qw(:sys_wait_h EINTR SIGKILL); use AnyEvent; +use Linux::Prctl qw(set_pdeathsig); use PVE::Exception qw(raise raise_perm_exc); use PVE::INotify; @@ -549,6 +550,9 @@ sub fork_worker { POSIX::setsid(); } + # The signal that the calling process will get when its parent dies + set_pdeathsig(SIGKILL); + POSIX::close ($psync[0]); POSIX::close ($ctrlfd[0]) if $sync; POSIX::close ($csync[1]); [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-05-21 13:13 [pve-devel] PVE child process behavior question Denis Kanchev via pve-devel @ 2025-05-22 6:30 ` Fabian Grünbichler 2025-05-22 6:55 ` Denis Kanchev via pve-devel [not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> 0 siblings, 2 replies; 15+ messages in thread From: Fabian Grünbichler @ 2025-05-22 6:30 UTC (permalink / raw) To: Proxmox VE development discussion > Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com> hat am 21.05.2025 15:13 CEST geschrieben: > Hello, > > We had an issue with a customer migrating a VM between nodes using our > shared storage solution. > > On the target host the OOM killer killed the main migration process, but > the child process (which actually performs the migration) kept on > working, which we did not expect, and that caused some issues. could you be more specific which process got killed? when you do a migration, a task worker is forked and its UPID is returned to the caller for further querying. as part of the migration, other processes get spawned: - ssh tunnel to the target node - storage migration processes (on both nodes) - VM state management CLI calls (on the target node) which of those is the "main migration process"? which is the child process? > This leads us to the broader question - after a request is submitted, > the parent can be terminated, and not return a response to the client, > while the work is being done, and the request can be wrongly retried or > considered unfinished. the parent should return almost immediately, as all it is doing at that point is returning the UPID to the client (the process then continues to do other work though, but that is no longer related to this task). the only exception is for "sync" task workers, like in a CLI context, where the "parent" has no other work to do, so it waits for the child/task to finish and prints its output while doing so, and some "bulk action" style API calls that fork multiple task workers and poll them themselves. > Should the child processes terminate together with the parent to guard > against this, or is this expected behavior? the parent (API worker process) and child (task worker process) have no direct relation after the task worker has been spawned. > Here is an example patch to do this: > > > diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm > > index bfde7e6..744fffc 100644 > > --- a/src/PVE/RESTEnvironment.pm > > +++ b/src/PVE/RESTEnvironment.pm > > @@ -13,8 +13,9 @@ use Fcntl qw(:flock); > > use IO::File; > > use IO::Handle; > > use IO::Select; > > -use POSIX qw(:sys_wait_h EINTR); > > +use POSIX qw(:sys_wait_h EINTR SIGKILL); > > use AnyEvent; > > +use Linux::Prctl qw(set_pdeathsig); > > > use PVE::Exception qw(raise raise_perm_exc); > > use PVE::INotify; > > @@ -549,6 +550,9 @@ sub fork_worker { > > POSIX::setsid(); > > } > > > + # The signal that the calling process will get when its parent dies > > + set_pdeathsig(SIGKILL); that has weird implications with regards to threads, so I don't think that is a good idea.. > > + > > POSIX::close ($psync[0]); > > POSIX::close ($ctrlfd[0]) if $sync; > > POSIX::close ($csync[1]); _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-05-22 6:30 ` Fabian Grünbichler @ 2025-05-22 6:55 ` Denis Kanchev via pve-devel [not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> 1 sibling, 0 replies; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-05-22 6:55 UTC (permalink / raw) To: Fabian Grünbichler, Proxmox VE development discussion; +Cc: Denis Kanchev [-- Attachment #1: Type: message/rfc822, Size: 9102 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com> Subject: Re: [pve-devel] PVE child process behavior question Date: Thu, 22 May 2025 09:55:49 +0300 Message-ID: <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> The parent of the storage migration process gets killed. It seems that this is the desired behavior and as far i understand it correctly - the child worker is detached from the parent and it has nothing to do with it after spawning. Thanks for the information, it was very helpful. On 22.05.25 г. 9:30 ч., Fabian Grünbichler wrote: >> Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com> hat am 21.05.2025 15:13 CEST geschrieben: >> Hello, >> >> We had an issue with a customer migrating a VM between nodes using our >> shared storage solution. >> >> On the target host the OOM killer killed the main migration process, but >> the child process (which actually performs the migration) kept on >> working, which we did not expect, and that caused some issues. > could you be more specific which process got killed? > > when you do a migration, a task worker is forked and its UPID is returned > to the caller for further querying. > > as part of the migration, other processes get spawned: > - ssh tunnel to the target node > - storage migration processes (on both nodes) > - VM state management CLI calls (on the target node) > > which of those is the "main migration process"? which is the child process? > >> This leads us to the broader question - after a request is submitted, >> the parent can be terminated, and not return a response to the client, >> while the work is being done, and the request can be wrongly retried or >> considered unfinished. > the parent should return almost immediately, as all it is doing at that > point is returning the UPID to the client (the process then continues to > do other work though, but that is no longer related to this task). > > the only exception is for "sync" task workers, like in a CLI context, > where the "parent" has no other work to do, so it waits for the child/task > to finish and prints its output while doing so, and some "bulk action" > style API calls that fork multiple task workers and poll them themselves. > >> Should the child processes terminate together with the parent to guard >> against this, or is this expected behavior? > the parent (API worker process) and child (task worker process) have no > direct relation after the task worker has been spawned. > >> Here is an example patch to do this: >> >> >> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm >> >> index bfde7e6..744fffc 100644 >> >> --- a/src/PVE/RESTEnvironment.pm >> >> +++ b/src/PVE/RESTEnvironment.pm >> >> @@ -13,8 +13,9 @@ use Fcntl qw(:flock); >> >> use IO::File; >> >> use IO::Handle; >> >> use IO::Select; >> >> -use POSIX qw(:sys_wait_h EINTR); >> >> +use POSIX qw(:sys_wait_h EINTR SIGKILL); >> >> use AnyEvent; >> >> +use Linux::Prctl qw(set_pdeathsig); >> >> >> use PVE::Exception qw(raise raise_perm_exc); >> >> use PVE::INotify; >> >> @@ -549,6 +550,9 @@ sub fork_worker { >> >> POSIX::setsid(); >> >> } >> >> >> + # The signal that the calling process will get when its parent dies >> >> + set_pdeathsig(SIGKILL); > that has weird implications with regards to threads, so I don't think that > is a good idea.. > >> + >> >> POSIX::close ($psync[0]); >> >> POSIX::close ($ctrlfd[0]) if $sync; >> >> POSIX::close ($csync[1]); [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>]
* Re: [pve-devel] PVE child process behavior question [not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> @ 2025-05-22 8:22 ` Fabian Grünbichler 2025-05-28 6:13 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> 0 siblings, 2 replies; 15+ messages in thread From: Fabian Grünbichler @ 2025-05-22 8:22 UTC (permalink / raw) To: Denis Kanchev, Proxmox VE development discussion; +Cc: Wolfgang Bumiller > Denis Kanchev <denis.kanchev@storpool.com> hat am 22.05.2025 08:55 CEST geschrieben: > > > The parent of the storage migration process gets killed. > > It seems that this is the desired behavior and as far i understand it > correctly - the child worker is detached from the parent and it has > nothing to do with it after spawning. was this a remote migration or a regular migration? could you maybe post the full task log? for a regular migration, the storage migration just uses our "run_command" helper. run_command uses open3 to spawn the command, and select for command output handling. basically the process tree would look like this API worker (one of X in pvedaemon) -> task worker (executing the migration code) --> storage migration command (xxx | ssh target_node xxx) and it does seem like run_command doesn't properly forward the parent being killed/terminated: $ perl -e 'use strict; use warnings; use PVE::Tools; warn "parent pid: $$\n"; PVE::Tools::run_command([["bash", "-c", "sleep 10; sleep 20; echo after > /tmp/file"]]);' parent pid: 204620 [1] 204618 terminated sudo perl -e (sending SIGTERM from another shell to 204620). the bash command continues executing, and also writes to /tmp/file after the sleeps are finished.. the same is also true for SIGKILL. SIGINT properly cleans up the child though. @Wolfgang: is this desired behaviour? > > Thanks for the information, it was very helpful. > > On 22.05.25 г. 9:30 ч., Fabian Grünbichler wrote: > >> Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com> hat am 21.05.2025 15:13 CEST geschrieben: > >> Hello, > >> > >> We had an issue with a customer migrating a VM between nodes using our > >> shared storage solution. > >> > >> On the target host the OOM killer killed the main migration process, but > >> the child process (which actually performs the migration) kept on > >> working, which we did not expect, and that caused some issues. > > could you be more specific which process got killed? > > > > when you do a migration, a task worker is forked and its UPID is returned > > to the caller for further querying. > > > > as part of the migration, other processes get spawned: > > - ssh tunnel to the target node > > - storage migration processes (on both nodes) > > - VM state management CLI calls (on the target node) > > > > which of those is the "main migration process"? which is the child process? > > > >> This leads us to the broader question - after a request is submitted, > >> the parent can be terminated, and not return a response to the client, > >> while the work is being done, and the request can be wrongly retried or > >> considered unfinished. > > the parent should return almost immediately, as all it is doing at that > > point is returning the UPID to the client (the process then continues to > > do other work though, but that is no longer related to this task). > > > > the only exception is for "sync" task workers, like in a CLI context, > > where the "parent" has no other work to do, so it waits for the child/task > > to finish and prints its output while doing so, and some "bulk action" > > style API calls that fork multiple task workers and poll them themselves. > > > >> Should the child processes terminate together with the parent to guard > >> against this, or is this expected behavior? > > the parent (API worker process) and child (task worker process) have no > > direct relation after the task worker has been spawned. > > > >> Here is an example patch to do this: > >> > >> > >> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm > >> > >> index bfde7e6..744fffc 100644 > >> > >> --- a/src/PVE/RESTEnvironment.pm > >> > >> +++ b/src/PVE/RESTEnvironment.pm > >> > >> @@ -13,8 +13,9 @@ use Fcntl qw(:flock); > >> > >> use IO::File; > >> > >> use IO::Handle; > >> > >> use IO::Select; > >> > >> -use POSIX qw(:sys_wait_h EINTR); > >> > >> +use POSIX qw(:sys_wait_h EINTR SIGKILL); > >> > >> use AnyEvent; > >> > >> +use Linux::Prctl qw(set_pdeathsig); > >> > >> > >> use PVE::Exception qw(raise raise_perm_exc); > >> > >> use PVE::INotify; > >> > >> @@ -549,6 +550,9 @@ sub fork_worker { > >> > >> POSIX::setsid(); > >> > >> } > >> > >> > >> + # The signal that the calling process will get when its parent dies > >> > >> + set_pdeathsig(SIGKILL); > > that has weird implications with regards to threads, so I don't think that > > is a good idea.. > > > >> + > >> > >> POSIX::close ($psync[0]); > >> > >> POSIX::close ($ctrlfd[0]) if $sync; > >> > >> POSIX::close ($csync[1]); _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-05-22 8:22 ` Fabian Grünbichler @ 2025-05-28 6:13 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> 1 sibling, 0 replies; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-05-28 6:13 UTC (permalink / raw) To: Fabian Grünbichler Cc: Denis Kanchev, Wolfgang Bumiller, Proxmox VE development discussion [-- Attachment #1: Type: message/rfc822, Size: 6691 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com> Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com> Subject: Re: [pve-devel] PVE child process behavior question Date: Wed, 28 May 2025 09:13:44 +0300 Message-ID: <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> Here is the task log 2025-04-11 03:45:42 starting migration of VM 2282 to node 'telpr01pve05' (10.10.17.5) 2025-04-11 03:45:42 starting VM 2282 on remote node 'telpr01pve05' 2025-04-11 03:45:45 [telpr01pve05] Warning: sch_htb: quantum of class 10001 is big. Consider r2q change. 2025-04-11 03:45:46 [telpr01pve05] Dump was interrupted and may be inconsistent. 2025-04-11 03:45:46 [telpr01pve05] kvm: failed to find file '/usr/share/qemu-server/bootsplash.jpg' 2025-04-11 03:45:46 start remote tunnel 2025-04-11 03:45:46 ssh tunnel ver 1 2025-04-11 03:45:46 starting online/live migration on unix:/run/qemu-server/2282.migrate 2025-04-11 03:45:46 set migration capabilities 2025-04-11 03:45:46 migration downtime limit: 100 ms 2025-04-11 03:45:46 migration cachesize: 4.0 GiB 2025-04-11 03:45:46 set migration parameters 2025-04-11 03:45:46 start migrate command to unix:/run/qemu-server/2282.migrate 2025-04-11 03:45:47 migration active, transferred 152.2 MiB of 24.0 GiB VM-state, 162.1 MiB/s ... 2025-04-11 03:46:49 migration active, transferred 15.2 GiB of 24.0 GiB VM-state, 2.0 GiB/s 2025-04-11 03:46:50 migration status error: failed 2025-04-11 03:46:50 ERROR: online migrate failure - aborting 2025-04-11 03:46:50 aborting phase 2 - cleanup resources 2025-04-11 03:46:50 migrate_cancel 2025-04-11 03:46:52 ERROR: migration finished with problems (duration 00:01:11) TASK ERROR: migration problems > that has weird implications with regards to threads, so I don't think that > is a good idea.. What you mean by that? Are any threads involved? [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>]
* Re: [pve-devel] PVE child process behavior question [not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> @ 2025-05-28 6:33 ` Fabian Grünbichler 2025-05-29 7:33 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> 0 siblings, 2 replies; 15+ messages in thread From: Fabian Grünbichler @ 2025-05-28 6:33 UTC (permalink / raw) To: Denis Kanchev; +Cc: Wolfgang Bumiller, Proxmox VE development discussion > Denis Kanchev <denis.kanchev@storpool.com> hat am 28.05.2025 08:13 CEST geschrieben: > > > Here is the task log > 2025-04-11 03:45:42 starting migration of VM 2282 to node 'telpr01pve05' (10.10.17.5) > 2025-04-11 03:45:42 starting VM 2282 on remote node 'telpr01pve05' > 2025-04-11 03:45:45 [telpr01pve05] Warning: sch_htb: quantum of class 10001 is big. Consider r2q change. > 2025-04-11 03:45:46 [telpr01pve05] Dump was interrupted and may be inconsistent. > 2025-04-11 03:45:46 [telpr01pve05] kvm: failed to find file '/usr/share/qemu-server/bootsplash.jpg' > 2025-04-11 03:45:46 start remote tunnel > 2025-04-11 03:45:46 ssh tunnel ver 1 > 2025-04-11 03:45:46 starting online/live migration on unix:/run/qemu-server/2282.migrate > 2025-04-11 03:45:46 set migration capabilities > 2025-04-11 03:45:46 migration downtime limit: 100 ms > 2025-04-11 03:45:46 migration cachesize: 4.0 GiB > 2025-04-11 03:45:46 set migration parameters > 2025-04-11 03:45:46 start migrate command to unix:/run/qemu-server/2282.migrate > 2025-04-11 03:45:47 migration active, transferred 152.2 MiB of 24.0 GiB VM-state, 162.1 MiB/s > ... > 2025-04-11 03:46:49 migration active, transferred 15.2 GiB of 24.0 GiB VM-state, 2.0 GiB/s > 2025-04-11 03:46:50 migration status error: failed > 2025-04-11 03:46:50 ERROR: online migrate failure - aborting > 2025-04-11 03:46:50 aborting phase 2 - cleanup resources > 2025-04-11 03:46:50 migrate_cancel > 2025-04-11 03:46:52 ERROR: migration finished with problems (duration 00:01:11) > TASK ERROR: migration problems okay, so no local disks involved.. not sure which process got killed then? ;) the state transfer happens entirely within the Qemu process, perl is just polling it to print the status, and that perl task worker is not OOM killed since it continues to print all the error handling messages.. > > that has weird implications with regards to threads, so I don't think that > > is a good idea.. > What you mean by that? Are any threads involved? not intentionally, no. the issue is that the whole "pr_set_deathsig" machinery works on the thread level, not the process level for historical reasons. so it actually would kill the child if the thread that called pr_set_deathsig exits.. I think we do want to improve how run_command handles the parent disappearing. but it's not that straight-forward to implement in a race-free fashion (in Perl). _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-05-28 6:33 ` Fabian Grünbichler @ 2025-05-29 7:33 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> 1 sibling, 0 replies; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-05-29 7:33 UTC (permalink / raw) To: Fabian Grünbichler Cc: Denis Kanchev, Wolfgang Bumiller, Proxmox VE development discussion [-- Attachment #1: Type: message/rfc822, Size: 10531 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com> Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com> Subject: Re: [pve-devel] PVE child process behavior question Date: Thu, 29 May 2025 10:33:14 +0300 Message-ID: <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> The issue here is that the storage plugin activate_volume() is called after migration cancel which in case of network shared storage can make things bad. This is a sort of race condition, because migration_cancel wont stop the storage migration on the remote server. As you can see below a call to activate_volume() is performed after migration_cancel. In this case we issue volume detach from the old node ( to keep the data consistent ) and we end up with a VM ( not migrated ) without this volume attached. We keep a track if activate_volume() is used for migration by the flag 'lock' => 'migrate', which is cleared on migration_cancel - in case of migration we won't detach the volume from the old VM. In short: when the parent of this storage migration task gets killed, the source node stops the migration, but the storage migration on the destination node continues. Source node: 2025-04-11 03:26:50 starting migration of VM 2421 to node 'telpr01pve03' (10.10.17.3) 2025-04-11 03:26:50 starting VM 2421 on remote node 'telpr01pve03' 2025-04-11 03:26:52 ERROR: online migrate failure - remote command failed with exit code 255 2025-04-11 03:26:52 aborting phase 2 - cleanup resources 2025-04-11 03:26:52 migrate_cancel # <<< NOTE the time 2025-04-11 03:26:53 ERROR: migration finished with problems (duration 00:00:03) TASK ERROR: migration problems Destination node: 2025-04-11T03:26:51.559671+07:00 telpr01pve03 qm[3670216]: <root@pam> starting task UPID:telpr01pve03:003800D4:00928867:67F8298B:qmstart:2421:root@pam: 2025-04-11T03:26:51.559897+07:00 telpr01pve03 qm[3670228]: start VM 2421: UPID:telpr01pve03:003800D4:00928867:67F8298B:qmstart:2421:root@pam: 2025-04-11T03:26:51.837905+07:00 telpr01pve03 qm[3670228]: StorPool plugin: Volume ~bj7n.b.abe is related to VM 2421, checking status ### Call to PVE::Storage::Plugin::activate_volume() 2025-04-11T03:26:53.072206+07:00 telpr01pve03 qm[3670228]: StorPool plugin: NOT a live migration of VM 2421, will force detach volume ~bj7n.b.abe ### 'lock' flag missing 2025-04-11T03:26:53.108206+07:00 telpr01pve03 qm[3670228]: StorPool plugin: Volume ~bj7n.b.sdj is related to VM 2421, checking status ### Second call to activate_volume() after migrate_cancel 2025-04-11T03:26:53.903357+07:00 telpr01pve03 qm[3670228]: StorPool plugin: NOT a live migration of VM 2421, will force detach volume ~bj7n.b.sdj ### 'lock' flag missing On Wed, May 28, 2025 at 9:33 AM Fabian Grünbichler < f.gruenbichler@proxmox.com> wrote: > > > Denis Kanchev <denis.kanchev@storpool.com> hat am 28.05.2025 08:13 CEST > geschrieben: > > > > > > Here is the task log > > 2025-04-11 03:45:42 starting migration of VM 2282 to node 'telpr01pve05' > (10.10.17.5) > > 2025-04-11 03:45:42 starting VM 2282 on remote node 'telpr01pve05' > > 2025-04-11 03:45:45 [telpr01pve05] Warning: sch_htb: quantum of class > 10001 is big. Consider r2q change. > > 2025-04-11 03:45:46 [telpr01pve05] Dump was interrupted and may be > inconsistent. > > 2025-04-11 03:45:46 [telpr01pve05] kvm: failed to find file > '/usr/share/qemu-server/bootsplash.jpg' > > 2025-04-11 03:45:46 start remote tunnel > > 2025-04-11 03:45:46 ssh tunnel ver 1 > > 2025-04-11 03:45:46 starting online/live migration on > unix:/run/qemu-server/2282.migrate > > 2025-04-11 03:45:46 set migration capabilities > > 2025-04-11 03:45:46 migration downtime limit: 100 ms > > 2025-04-11 03:45:46 migration cachesize: 4.0 GiB > > 2025-04-11 03:45:46 set migration parameters > > 2025-04-11 03:45:46 start migrate command to > unix:/run/qemu-server/2282.migrate > > 2025-04-11 03:45:47 migration active, transferred 152.2 MiB of 24.0 GiB > VM-state, 162.1 MiB/s > > ... > > 2025-04-11 03:46:49 migration active, transferred 15.2 GiB of 24.0 GiB > VM-state, 2.0 GiB/s > > 2025-04-11 03:46:50 migration status error: failed > > 2025-04-11 03:46:50 ERROR: online migrate failure - aborting > > 2025-04-11 03:46:50 aborting phase 2 - cleanup resources > > 2025-04-11 03:46:50 migrate_cancel > > 2025-04-11 03:46:52 ERROR: migration finished with problems (duration > 00:01:11) > > TASK ERROR: migration problems > > okay, so no local disks involved.. not sure which process got killed then? > ;) > the state transfer happens entirely within the Qemu process, perl is just > polling > it to print the status, and that perl task worker is not OOM killed since > it > continues to print all the error handling messages.. > > > > that has weird implications with regards to threads, so I don't think > that > > > is a good idea.. > > What you mean by that? Are any threads involved? > > not intentionally, no. the issue is that the whole "pr_set_deathsig" > machinery > works on the thread level, not the process level for historical reasons. > so it > actually would kill the child if the thread that called pr_set_deathsig > exits.. > > I think we do want to improve how run_command handles the parent > disappearing. > but it's not that straight-forward to implement in a race-free fashion (in > Perl). > > [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>]
* Re: [pve-devel] PVE child process behavior question [not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> @ 2025-06-02 7:37 ` Fabian Grünbichler 2025-06-02 8:35 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> 0 siblings, 2 replies; 15+ messages in thread From: Fabian Grünbichler @ 2025-06-02 7:37 UTC (permalink / raw) To: Denis Kanchev; +Cc: Wolfgang Bumiller, Proxmox VE development discussion > Denis Kanchev <denis.kanchev@storpool.com> hat am 29.05.2025 09:33 CEST geschrieben: > > > The issue here is that the storage plugin activate_volume() is called after migration cancel which in case of network shared storage can make things bad. > This is a sort of race condition, because migration_cancel wont stop the storage migration on the remote server. As you can see below a call to activate_volume() is performed after migration_cancel. > In this case we issue volume detach from the old node ( to keep the data consistent ) and we end up with a VM ( not migrated ) without this volume attached. > We keep a track if activate_volume() is used for migration by the flag 'lock' => 'migrate', which is cleared on migration_cancel - in case of migration we won't detach the volume from the old VM. > In short: when the parent of this storage migration task gets killed, the source node stops the migration, but the storage migration on the destination node continues. > > Source node: > 2025-04-11 03:26:50 starting migration of VM 2421 to node 'telpr01pve03' (10.10.17.3) > 2025-04-11 03:26:50 starting VM 2421 on remote node 'telpr01pve03' > 2025-04-11 03:26:52 ERROR: online migrate failure - remote command failed with exit code 255 > 2025-04-11 03:26:52 aborting phase 2 - cleanup resources > 2025-04-11 03:26:52 migrate_cancel # <<< NOTE the time2025-04-11 03:26:53 ERROR: migration finished with problems (duration 00:00:03) > TASK ERROR: migration problems could you provide the full migration task log and the VM config? I thought your storage plugin is a shared storage, so there is no storage migration at all, yet you keep talking about storage migration? > Destination node:2025-04-11T03:26:51.559671+07:00 telpr01pve03 qm[3670216]: <root@pam> starting task UPID:telpr01pve03:003800D4:00928867:67F8298B:qmstart:2421:root@pam: > 2025-04-11T03:26:51.559897+07:00 telpr01pve03 qm[3670228]: start VM 2421: UPID:telpr01pve03:003800D4:00928867:67F8298B:qmstart:2421:root@pam: so starting the VM on the target node failed? why? > 2025-04-11T03:26:51.837905+07:00 telpr01pve03 qm[3670228]: StorPool plugin: Volume ~bj7n.b.abe is related to VM 2421, checking status ### Call to PVE::Storage::Plugin::activate_volume()2025-04-11T03:26:53.072206+07:00 telpr01pve03 qm[3670228]: StorPool plugin: NOT a live migration of VM 2421, will force detach volume ~bj7n.b.abe ###'lock' flag missing > 2025-04-11T03:26:53.108206+07:00 telpr01pve03 qm[3670228]: StorPool plugin: Volume ~bj7n.b.sdj is related to VM 2421, checking status ### Second call to activate_volume() after migrate_cancel2025-04-11T03:26:53.903357+07:00 telpr01pve03 qm[3670228]: StorPool plugin: NOT a live migration of VM 2421, will force detach volume ~bj7n.b.sdj###'lock' flag missing > > > > > On Wed, May 28, 2025 at 9:33 AM Fabian Grünbichler <f.gruenbichler@proxmox.com> wrote: > > > > > Denis Kanchev <denis.kanchev@storpool.com> hat am 28.05.2025 08:13 CEST geschrieben: > > > > > > > > > Here is the task log > > > 2025-04-11 03:45:42 starting migration of VM 2282 to node 'telpr01pve05' (10.10.17.5) > > > 2025-04-11 03:45:42 starting VM 2282 on remote node 'telpr01pve05' > > > 2025-04-11 03:45:45 [telpr01pve05] Warning: sch_htb: quantum of class 10001 is big. Consider r2q change. > > > 2025-04-11 03:45:46 [telpr01pve05] Dump was interrupted and may be inconsistent. > > > 2025-04-11 03:45:46 [telpr01pve05] kvm: failed to find file '/usr/share/qemu-server/bootsplash.jpg' > > > 2025-04-11 03:45:46 start remote tunnel > > > 2025-04-11 03:45:46 ssh tunnel ver 1 > > > 2025-04-11 03:45:46 starting online/live migration on unix:/run/qemu-server/2282.migrate > > > 2025-04-11 03:45:46 set migration capabilities > > > 2025-04-11 03:45:46 migration downtime limit: 100 ms > > > 2025-04-11 03:45:46 migration cachesize: 4.0 GiB > > > 2025-04-11 03:45:46 set migration parameters > > > 2025-04-11 03:45:46 start migrate command to unix:/run/qemu-server/2282.migrate > > > 2025-04-11 03:45:47 migration active, transferred 152.2 MiB of 24.0 GiB VM-state, 162.1 MiB/s > > > ... > > > 2025-04-11 03:46:49 migration active, transferred 15.2 GiB of 24.0 GiB VM-state, 2.0 GiB/s > > > 2025-04-11 03:46:50 migration status error: failed > > > 2025-04-11 03:46:50 ERROR: online migrate failure - aborting > > > 2025-04-11 03:46:50 aborting phase 2 - cleanup resources > > > 2025-04-11 03:46:50 migrate_cancel > > > 2025-04-11 03:46:52 ERROR: migration finished with problems (duration 00:01:11) > > > TASK ERROR: migration problems > > > > okay, so no local disks involved.. not sure which process got killed then? ;) > > the state transfer happens entirely within the Qemu process, perl is just polling > > it to print the status, and that perl task worker is not OOM killed since it > > continues to print all the error handling messages.. > > > > > > that has weird implications with regards to threads, so I don't think that > > > > is a good idea.. > > > What you mean by that? Are any threads involved? > > > > not intentionally, no. the issue is that the whole "pr_set_deathsig" machinery > > works on the thread level, not the process level for historical reasons. so it > > actually would kill the child if the thread that called pr_set_deathsig exits.. > > > > I think we do want to improve how run_command handles the parent disappearing. > > but it's not that straight-forward to implement in a race-free fashion (in Perl). > > > > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-06-02 7:37 ` Fabian Grünbichler @ 2025-06-02 8:35 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> 1 sibling, 0 replies; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-06-02 8:35 UTC (permalink / raw) To: Fabian Grünbichler Cc: Denis Kanchev, Wolfgang Bumiller, Proxmox VE development discussion [-- Attachment #1: Type: message/rfc822, Size: 9626 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com> Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com> Subject: Re: [pve-devel] PVE child process behavior question Date: Mon, 2 Jun 2025 11:35:22 +0300 Message-ID: <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> > I thought your storage plugin is a shared storage, so there is no storage migration at all, yet you keep talking about storage migration? It's a shared storage indeed, the issue was that the migration process on the destination host got OOM killed and the migration failed, most probably that's why there is no log about the storage migration, but that didn't stop the storage migration on the destination host. 2025-04-11T03:26:52.283913+07:00 telpr01pve03 kernel: [96031.290519] pvesh invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 Here is one more migration task attempt where it lived long enough to show more detailed log: 2025-04-11 03:29:11 starting migration of VM 2421 to node 'telpr01pve06' (10.10.17.6) 2025-04-11 03:29:11 starting VM 2421 on remote node 'telpr01pve06' 2025-04-11 03:29:15 [telpr01pve06] Warning: sch_htb: quantum of class 10001 is big. Consider r2q change. 2025-04-11 03:29:15 [telpr01pve06] kvm: failed to find file '/usr/share/qemu-server/bootsplash.jpg' 2025-04-11 03:29:15 start remote tunnel 2025-04-11 03:29:16 ssh tunnel ver 1 2025-04-11 03:29:16 starting online/live migration on unix:/run/qemu-server/2421.migrate 2025-04-11 03:29:16 set migration capabilities 2025-04-11 03:29:16 migration downtime limit: 100 ms 2025-04-11 03:29:16 migration cachesize: 256.0 MiB 2025-04-11 03:29:16 set migration parameters 2025-04-11 03:29:16 start migrate command to unix:/run/qemu-server/2421.migrate 2025-04-11 03:29:17 migration active, transferred 281.0 MiB of 2.0 GiB VM-state, 340.5 MiB/s 2025-04-11 03:29:18 migration active, transferred 561.5 MiB of 2.0 GiB VM-state, 307.2 MiB/s 2025-04-11 03:29:19 migration active, transferred 849.2 MiB of 2.0 GiB VM-state, 288.5 MiB/s 2025-04-11 03:29:20 migration active, transferred 1.1 GiB of 2.0 GiB VM-state, 283.7 MiB/s 2025-04-11 03:29:21 migration active, transferred 1.4 GiB of 2.0 GiB VM-state, 302.5 MiB/s 2025-04-11 03:29:23 migration active, transferred 1.8 GiB of 2.0 GiB VM-state, 278.6 MiB/s 2025-04-11 03:29:23 migration status error: failed 2025-04-11 03:29:23 ERROR: online migrate failure - aborting 2025-04-11 03:29:23 aborting phase 2 - cleanup resources 2025-04-11 03:29:23 migrate_cancel 2025-04-11 03:29:25 ERROR: migration finished with problems (duration 00:00:14) TASK ERROR: migration problems > could you provide the full migration task log and the VM config? 2025-04-11 03:26:50 starting migration of VM 2421 to node 'telpr01pve03' (10.10.17.3) ### QemuMigrate::phase1() +749 2025-04-11 03:26:50 starting VM 2421 on remote node 'telpr01pve03' # QemuMigrate::phase2_start_local_cluster() +888 2025-04-11 03:26:52 ERROR: online migrate failure - remote command failed with exit code 255 2025-04-11 03:26:52 aborting phase 2 - cleanup resources 2025-04-11 03:26:52 migrate_cancel 2025-04-11 03:26:53 ERROR: migration finished with problems (duration 00:00:03) TASK ERROR: migration problems VM config #Ubuntu-24.04-14082024 #StorPool adjustment agent: 1,fstrim_cloned_disks=1 autostart: 1 boot: c bootdisk: scsi0 cipassword: XXX citype: nocloud ciupgrade: 0 ciuser: test cores: 2 cpu: EPYC-Genoa cpulimit: 2 ide0: VMDataSp:vm-2421-cloudinit.raw,media=cdrom ipconfig0: ipxxx memory: 2048 meta: creation-qemu=8.1.5,ctime=1722917972 name: kredibel-service nameserver: xxx net0: virtio=xxx,bridge=vmbr2,firewall=1,rate=250,tag=220 numa: 0 onboot: 1 ostype: l26 scsi0: VMDataSp:vm-2421-disk-0-sp-bj7n.b.sdj.raw,aio=native,discard=on,iops_rd=20000,iops_rd_max=40000,iops_rd_max_length=60,iops_wr=20000,iops_wr_max=40000,iops_wr_max_length=60,iothread=1,size=40G scsihw: virtio-scsi-single searchdomain: neo.internal serial0: socket smbios1: uuid=dfxxx sockets: 1 sshkeys: ssh-rsa% vmgenid: 17b154a0- IN this case the call to PVE::Storage::Plugin::activate_volume() was performed after migration cancelation 2025-04-11T03:26:53.072206+07:00 telpr01pve03 qm[3670228]: StorPool plugin: NOT a live migration of VM 2421, will force detach volume ~bj7n.b.abe <<< This log is from the sub activate_volume() in our custom storage plugin [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>]
* Re: [pve-devel] PVE child process behavior question [not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> @ 2025-06-02 8:49 ` Fabian Grünbichler 2025-06-02 9:18 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> 0 siblings, 2 replies; 15+ messages in thread From: Fabian Grünbichler @ 2025-06-02 8:49 UTC (permalink / raw) To: Denis Kanchev; +Cc: Wolfgang Bumiller, Proxmox VE development discussion > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 10:35 CEST geschrieben: > > > > I thought your storage plugin is a shared storage, so there is no storage migration at all, yet you keep talking about storage migration?It's a shared storage indeed, the issue was that the migration process on the destination host got OOM killed and the migration failed, most probably that's why there is no log about the storage migration, but that didn't stop the storage migration on the destination host. could you please explain what you mean by storage migration? :) when I say "storage migration" I mean either - the target VM exporting newly allocated volumes via NBD, and the source VM mirroring its disks via blockjob onto those exported volumes - PVE::Storage::storage_migrate, which exports a volume, pipes it over SSH or a websocket tunnel and imports it on the other side the first is what happens in a live migration for volumes currently used by the VM. the second is what happens for other volumes, or in case of an offline migration. both will only happen for local volumes, as with a shared storage, *there is nothing to migrate*. are you talking about something your storage does (hand-over of control?)? there also is no "migration process on the destination host", there just is the target VM running there - did that VM get OOM-killed? or the `qm start` invocation itself? or ... ? the migration task is only running on the source node.. please really try to be specific here, it's easy to misunderstand things or guess wrongly otherwise.. AFAIU, the sequence was: migration started target VM started live-migration started something happens on the destination node (??) that aborts the migration source node does migrate_cancel (which is somehow hooked to your storage and removes a flag/lock/.. on the volume?) something on the destination node calls activate_volume (which checks this flag/lock and is confused because it is missing?) > 2025-04-11T03:26:52.283913+07:00 telpr01pve03 kernel: [96031.290519] pvesh invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 > > Here is one more migration task attempt where it lived long enough to show more detailed log: > > 2025-04-11 03:29:11 starting migration of VM 2421 to node 'telpr01pve06' (10.10.17.6) > 2025-04-11 03:29:11 starting VM 2421 on remote node 'telpr01pve06' > 2025-04-11 03:29:15 [telpr01pve06] Warning: sch_htb: quantum of class 10001 is big. Consider r2q change. > 2025-04-11 03:29:15 [telpr01pve06] kvm: failed to find file '/usr/share/qemu-server/bootsplash.jpg' > 2025-04-11 03:29:15 start remote tunnel > 2025-04-11 03:29:16 ssh tunnel ver 1 > 2025-04-11 03:29:16 starting online/live migration on unix:/run/qemu-server/2421.migrate > 2025-04-11 03:29:16 set migration capabilities > 2025-04-11 03:29:16 migration downtime limit: 100 ms > 2025-04-11 03:29:16 migration cachesize: 256.0 MiB > 2025-04-11 03:29:16 set migration parameters > 2025-04-11 03:29:16 start migrate command to unix:/run/qemu-server/2421.migrate > 2025-04-11 03:29:17 migration active, transferred 281.0 MiB of 2.0 GiB VM-state, 340.5 MiB/s > 2025-04-11 03:29:18 migration active, transferred 561.5 MiB of 2.0 GiB VM-state, 307.2 MiB/s > 2025-04-11 03:29:19 migration active, transferred 849.2 MiB of 2.0 GiB VM-state, 288.5 MiB/s > 2025-04-11 03:29:20 migration active, transferred 1.1 GiB of 2.0 GiB VM-state, 283.7 MiB/s > 2025-04-11 03:29:21 migration active, transferred 1.4 GiB of 2.0 GiB VM-state, 302.5 MiB/s > 2025-04-11 03:29:23 migration active, transferred 1.8 GiB of 2.0 GiB VM-state, 278.6 MiB/s > 2025-04-11 03:29:23 migration status error: failed > 2025-04-11 03:29:23 ERROR: online migrate failure - aborting > 2025-04-11 03:29:23 aborting phase 2 - cleanup resources > 2025-04-11 03:29:23 migrate_cancel > 2025-04-11 03:29:25 ERROR: migration finished with problems (duration 00:00:14) > TASK ERROR: migration problems > > > > could you provide the full migration task log and the VM config? > 2025-04-11 03:26:50 starting migration of VM 2421 to node 'telpr01pve03' (10.10.17.3) ### QemuMigrate::phase1() +749 > 2025-04-11 03:26:50 starting VM 2421 on remote node 'telpr01pve03' # QemuMigrate::phase2_start_local_cluster() +888 > 2025-04-11 03:26:52 ERROR: online migrate failure - remote command failed with exit code 255 > 2025-04-11 03:26:52 aborting phase 2 - cleanup resources > 2025-04-11 03:26:52 migrate_cancel > 2025-04-11 03:26:53 ERROR: migration finished with problems (duration 00:00:03) > TASK ERROR: migration problems > > > VM config#Ubuntu-24.04-14082024 > #StorPool adjustment > agent: 1,fstrim_cloned_disks=1 > autostart: 1 > boot: c > bootdisk: scsi0 > cipassword: XXX > citype: nocloud > ciupgrade: 0 > ciuser: test > cores: 2 > cpu: EPYC-Genoa > cpulimit: 2 > ide0: VMDataSp:vm-2421-cloudinit.raw,media=cdrom > ipconfig0: ipxxx > memory: 2048 > meta: creation-qemu=8.1.5,ctime=1722917972 > name: kredibel-service > nameserver: xxx > net0: virtio=xxx,bridge=vmbr2,firewall=1,rate=250,tag=220 > numa: 0 > onboot: 1 > ostype: l26 > scsi0: VMDataSp:vm-2421-disk-0-sp-bj7n.b.sdj.raw,aio=native,discard=on,iops_rd=20000,iops_rd_max=40000,iops_rd_max_length=60,iops_wr=20000,iops_wr_max=40000,iops_wr_max_length=60,iothread=1,size=40G > scsihw: virtio-scsi-single > searchdomain: neo.internal > serial0: socket > smbios1: uuid=dfxxx > sockets: 1 > sshkeys: ssh-rsa% > vmgenid: 17b154a0- > > IN this case the call to PVE::Storage::Plugin::activate_volume() was performed after migration cancelation2025-04-11T03:26:53.072206+07:00 telpr01pve03 qm[3670228]: StorPool plugin: NOT a live migration of VM 2421, will force detach volume ~bj7n.b.abe <<< This log is from the sub activate_volume() in our custom storage plugin > > > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-06-02 8:49 ` Fabian Grünbichler @ 2025-06-02 9:18 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> 1 sibling, 0 replies; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-06-02 9:18 UTC (permalink / raw) To: Fabian Grünbichler Cc: Denis Kanchev, Wolfgang Bumiller, Proxmox VE development discussion [-- Attachment #1: Type: message/rfc822, Size: 6042 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com> Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com> Subject: Re: [pve-devel] PVE child process behavior question Date: Mon, 2 Jun 2025 12:18:01 +0300 Message-ID: <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> My bad :) in terms of Proxmox it must be hand-overing the storage control - the storage plugin function activate_volume() is called in our case, which moves the storage to the new VM. So no data is moved across the nodes and only the volumes get re-attached. Thanks for the plentiful information [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>]
* Re: [pve-devel] PVE child process behavior question [not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> @ 2025-06-02 11:42 ` Fabian Grünbichler 2025-06-02 13:23 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com> 0 siblings, 2 replies; 15+ messages in thread From: Fabian Grünbichler @ 2025-06-02 11:42 UTC (permalink / raw) To: Denis Kanchev; +Cc: Wolfgang Bumiller, Proxmox VE development discussion > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST geschrieben: > > > My bad :) in terms of Proxmox it must be hand-overing the storage control - the storage plugin function activate_volume() is called in our case, which moves the storage to the new VM. > So no data is moved across the nodes and only the volumes get re-attached. > Thanks for the plentiful information okay! so you basically special case this "volume is active on two nodes" case which should only happen during a live migration, and that somehow runs into an issue if the migration is aborted because there is some suspected race somewhere? as part of a live migration, the sequence should be: node A: migration starts node A: start request for target VM on node B (over SSH) node B: `qm start ..` is called node B: qm start will activate volumes node B: qm start returns node A: migration starts node A/B: some fatal error node A: cancel migration (via QMP/the source VM running on node A) node A: request to stop target VM on node B (over SSH) node B: `qm stop ..` called node B: qm stop will deactivate volumes I am not sure where another activate_volume call after node A has started the migration could happen? at that point, node A still has control over the VM (ID), so nothing in PVE should operate on it other than the selective calls made as part of the migration, which are basically only querying migration status and error handling at that point.. it would still be good to know what actually got OOM-killed in your case.. was it the `qm start`? was it the `kvm` process itself? something entirely else? if you can reproduce the issue, you could also add logging in activate_volume to find out the exact call path (e.g., log the call stack somewhere), maybe that helps find the exact scenario that you are seeing.. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-06-02 11:42 ` Fabian Grünbichler @ 2025-06-02 13:23 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com> 1 sibling, 0 replies; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-06-02 13:23 UTC (permalink / raw) To: Fabian Grünbichler Cc: Denis Kanchev, Wolfgang Bumiller, Proxmox VE development discussion [-- Attachment #1: Type: message/rfc822, Size: 11746 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com> Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com> Subject: Re: [pve-devel] PVE child process behavior question Date: Mon, 2 Jun 2025 16:23:27 +0300 Message-ID: <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com> We tend to prevent having a volume active on two nodes, as may lead to data corruption, so we detach the volume from all nodes ( except the target one ) via our shared storage system. In the sub activate_volume() our logic is to not detach the volume from other hosts in case of migration - because activate_volume() can be called in other cases, where detaching is necessary. But in this case where the QM start process is killed, the migration is marked as failed and still activate_volume() is called on the destination host after migration_cancel ( we track the "lock" flag to be migrate ). That's why i proposed the child processes to be killed when the parent one dies - it will prevent such cases. Not sure if passing an extra argument (marking it as migration) to activate_volume() will solve such issue too. Here is a trace log of activate_volume() in case of migration. 2025-05-02 13:03:28.2222 [2712103] took 0.0006: activate_volume: storeid 'autotest__ec2_1', scfg {'type' => 'storpool','shared' => 1,'template' => 'autotest__ec2_1','extra-tags' => 'tier=high','content' => {'iso' => 1,'images' => 1}}, volname 'vm-101-disk-0-sp-z.b.df.raw', exclusive undef at /usr/share/perl5/PVE/St orage/Custom/StorPoolPlugin.pm line 1551. PVE::Storage::Custom::StorPoolPlugin::activate_volume("PVE::Storage::Custom::StorPoolPlugin", "autotest__ec2_1", HASH(0x559cd06d88a0), "vm-101-disk-0-sp-z.b.df.raw", undef, HASH(0x559cd076b9a8)) called at /usr/share/perl5/PVE/Storage.pm line 1309 PVE::Storage::activate_volumes(HASH(0x559cc99d04e0), ARRAY(0x559cd0754558)) called at /usr/share/perl5/PVE/QemuServer.pm line 5823 PVE::QemuServer::vm_start_nolock(HASH(0x559cc99d04e0), 101, HASH(0x559cd0730ca0), HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at /usr/share/perl5/PVE/QemuServer.pm line 5592 PVE::QemuServer::__ANON__() called at /usr/share/perl5/PVE/AbstractConfig.pm line 299 PVE::AbstractConfig::__ANON__() called at /usr/share/perl5/PVE/Tools.pm line 259 eval {...} called at /usr/share/perl5/PVE/Tools.pm line 259 PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf", 10, 0, CODE(0x559ccf14b968)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 302 PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 322 PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 330 PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/QemuServer.pm line 5593 PVE::QemuServer::vm_start(HASH(0x559cc99d04e0), 101, HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 3259 PVE::API2::Qemu::__ANON__("UPID:lab-dk-2:00296227:0ADF72E0:683DA11F:qmstart:101:root\@pam:") called at /usr/share/perl5/PVE/RESTEnvironment.pm line 620 eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line 611 PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x559cc99d0558), "qmstart", 101, "root\@pam", CODE(0x559cd06cc160)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 3263 PVE::API2::Qemu::__ANON__(HASH(0x559cd0700df8)) called at /usr/share/perl5/PVE/RESTHandler.pm line 499 PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x559cd05deb98), HASH(0x559cd0700df8), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line 985 eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 968 PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start", "vm_start", ARRAY(0x559cc99cfee0), ARRAY(0x559cd0745e98), HASH(0x559cd0745ef8), CODE(0x559cd07091f8), undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 594 PVE::CLIHandler::__ANON__(ARRAY(0x559cc99d00c0), undef, CODE(0x559cd07091f8)) called at /usr/share/perl5/PVE/CLIHandler.pm line 673 PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at /usr/sbin/qm line 8 On Mon, Jun 2, 2025 at 2:42 PM Fabian Grünbichler < f.gruenbichler@proxmox.com> wrote: > > > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST > geschrieben: > > > > > > My bad :) in terms of Proxmox it must be hand-overing the storage > control - the storage plugin function activate_volume() is called in our > case, which moves the storage to the new VM. > > So no data is moved across the nodes and only the volumes get > re-attached. > > Thanks for the plentiful information > > okay! > > so you basically special case this "volume is active on two nodes" case > which should only happen during a live migration, and that somehow runs > into an issue if the migration is aborted because there is some suspected > race somewhere? > > as part of a live migration, the sequence should be: > > node A: migration starts > node A: start request for target VM on node B (over SSH) > node B: `qm start ..` is called > node B: qm start will activate volumes > node B: qm start returns > node A: migration starts > node A/B: some fatal error > node A: cancel migration (via QMP/the source VM running on node A) > node A: request to stop target VM on node B (over SSH) > node B: `qm stop ..` called > node B: qm stop will deactivate volumes > > I am not sure where another activate_volume call after node A has started > the migration could happen? at that point, node A still has control over > the VM (ID), so nothing in PVE should operate on it other than the > selective calls made as part of the migration, which are basically only > querying migration status and error handling at that point.. > > it would still be good to know what actually got OOM-killed in your case.. > was it the `qm start`? was it the `kvm` process itself? something entirely > else? > > if you can reproduce the issue, you could also add logging in > activate_volume to find out the exact call path (e.g., log the call stack > somewhere), maybe that helps find the exact scenario that you are seeing.. > > [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>]
* Re: [pve-devel] PVE child process behavior question [not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com> @ 2025-06-02 14:31 ` Fabian Grünbichler 2025-06-04 12:52 ` Denis Kanchev via pve-devel 0 siblings, 1 reply; 15+ messages in thread From: Fabian Grünbichler @ 2025-06-02 14:31 UTC (permalink / raw) To: Denis Kanchev; +Cc: Wolfgang Bumiller, Proxmox VE development discussion > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 15:23 CEST geschrieben: > > > We tend to prevent having a volume active on two nodes, as may lead to data corruption, so we detach the volume from all nodes ( except the target one ) via our shared storage system. > In the sub activate_volume() our logic is to not detach the volume from other hosts in case of migration - because activate_volume() can be called in other cases, where detaching is necessary. > But in this case where the QM start process is killed, the migration is marked as failed and still activate_volume() is called on the destination host after migration_cancel ( we track the "lock" flag to be migrate ). > That's why i proposed the child processes to be killed when the parent one dies - it will prevent such cases. > Not sure if passing an extra argument (marking it as migration) to activate_volume() will solve such issue too. > Here is a trace log of activate_volume() in case of migration. but that activation happens as part of starting the target VM, which happens before the (actual) migration is started in QEMU. so in this case, we have qm start (over SSH, is this being killed?) -> start_vm task worker (or this?) --> activate_volume --> fork, enter systemd scope, run_command to execute the kvm process ---> kvm (or this?) how are you hooking the migration state to know whether deactivation should be done or not? > > 2025-05-02 13:03:28.2222 [2712103] took 0.0006:activate_volume:storeid 'autotest__ec2_1', scfg {'type' => 'storpool','shared' => 1,'template' => 'autotest__ec2_1','extra-tags' => 'tier=high','content' => {'iso' => 1,'images' => 1}}, volname 'vm-101-disk-0-sp-z.b.df.raw', exclusive undef at /usr/share/perl5/PVE/St > orage/Custom/StorPoolPlugin.pm line 1551. > PVE::Storage::Custom::StorPoolPlugin::activate_volume("PVE::Storage::Custom::StorPoolPlugin", "autotest__ec2_1", HASH(0x559cd06d88a0), "vm-101-disk-0-sp-z.b.df.raw", undef, HASH(0x559cd076b9a8)) called at /usr/share/perl5/PVE/Storage.pm line 1309 > PVE::Storage::activate_volumes(HASH(0x559cc99d04e0), ARRAY(0x559cd0754558)) called at /usr/share/perl5/PVE/QemuServer.pm line 5823 > PVE::QemuServer::vm_start_nolock(HASH(0x559cc99d04e0), 101, HASH(0x559cd0730ca0), HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at /usr/share/perl5/PVE/QemuServer.pm line 5592 > PVE::QemuServer::__ANON__() called at /usr/share/perl5/PVE/AbstractConfig.pm line 299 > PVE::AbstractConfig::__ANON__() called at /usr/share/perl5/PVE/Tools.pm line 259 > eval {...} called at /usr/share/perl5/PVE/Tools.pm line 259 > PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf", 10, 0, CODE(0x559ccf14b968)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 302 > PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 322 > PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line 330 > PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101, CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/QemuServer.pm line 5593 > PVE::QemuServer::vm_start(HASH(0x559cc99d04e0), 101, HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 3259 > PVE::API2::Qemu::__ANON__("UPID:lab-dk-2:00296227:0ADF72E0:683DA11F:qmstart:101:root\@pam:") called at /usr/share/perl5/PVE/RESTEnvironment.pm line 620 > eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line 611 > PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x559cc99d0558), "qmstart", 101, "root\@pam", CODE(0x559cd06cc160)) called at /usr/share/perl5/PVE/API2/Qemu.pm line 3263 > PVE::API2::Qemu::__ANON__(HASH(0x559cd0700df8)) called at /usr/share/perl5/PVE/RESTHandler.pm line 499 > PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x559cd05deb98), HASH(0x559cd0700df8), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line 985 > eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 968 > PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start", "vm_start", ARRAY(0x559cc99cfee0), ARRAY(0x559cd0745e98), HASH(0x559cd0745ef8), CODE(0x559cd07091f8), undef) called at /usr/share/perl5/PVE/CLIHandler.pm line 594 > PVE::CLIHandler::__ANON__(ARRAY(0x559cc99d00c0), undef, CODE(0x559cd07091f8)) called at /usr/share/perl5/PVE/CLIHandler.pm line 673 > PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at /usr/sbin/qm line 8 > > > On Mon, Jun 2, 2025 at 2:42 PM Fabian Grünbichler <f.gruenbichler@proxmox.com> wrote: > > > > > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST geschrieben: > > > > > > > > > My bad :) in terms of Proxmox it must be hand-overing the storage control - the storage plugin function activate_volume() is called in our case, which moves the storage to the new VM. > > > So no data is moved across the nodes and only the volumes get re-attached. > > > Thanks for the plentiful information > > > > okay! > > > > so you basically special case this "volume is active on two nodes" case which should only happen during a live migration, and that somehow runs into an issue if the migration is aborted because there is some suspected race somewhere? > > > > as part of a live migration, the sequence should be: > > > > node A: migration starts > > node A: start request for target VM on node B (over SSH) > > node B: `qm start ..` is called > > node B: qm start will activate volumes > > node B: qm start returns > > node A: migration starts > > node A/B: some fatal error > > node A: cancel migration (via QMP/the source VM running on node A) > > node A: request to stop target VM on node B (over SSH) > > node B: `qm stop ..` called > > node B: qm stop will deactivate volumes > > > > I am not sure where another activate_volume call after node A has started the migration could happen? at that point, node A still has control over the VM (ID), so nothing in PVE should operate on it other than the selective calls made as part of the migration, which are basically only querying migration status and error handling at that point.. > > > > it would still be good to know what actually got OOM-killed in your case.. was it the `qm start`? was it the `kvm` process itself? something entirely else? > > > > if you can reproduce the issue, you could also add logging in activate_volume to find out the exact call path (e.g., log the call stack somewhere), maybe that helps find the exact scenario that you are seeing.. > > > > > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pve-devel] PVE child process behavior question 2025-06-02 14:31 ` Fabian Grünbichler @ 2025-06-04 12:52 ` Denis Kanchev via pve-devel 0 siblings, 0 replies; 15+ messages in thread From: Denis Kanchev via pve-devel @ 2025-06-04 12:52 UTC (permalink / raw) To: Fabian Grünbichler Cc: Denis Kanchev, Wolfgang Bumiller, Proxmox VE development discussion [-- Attachment #1: Type: message/rfc822, Size: 6420 bytes --] From: Denis Kanchev <denis.kanchev@storpool.com> To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com> Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com> Subject: Re: [pve-devel] PVE child process behavior question Date: Wed, 4 Jun 2025 15:52:14 +0300 Message-ID: <CAHXTzukATnAznNmeTq2YdJqL0KRXk7e=AFKW7MNQaa9f=3+n+w@mail.gmail.com> > how are you hooking the migration state to know whether deactivation should be done or not? By using the VM property "lock" which must be "migrate": PVE::Cluster::get_guest_config_properties(['lock']); > qm start (over SSH, is this being killed?) > -> start_vm task worker (or this?) > --> activate_volume > --> fork, enter systemd scope, run_command to execute the kvm process > ---> kvm (or this?) The parent of the process that is executing activate_volume() is killed, in this case it should be qm start. [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-04 12:53 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-05-21 13:13 [pve-devel] PVE child process behavior question Denis Kanchev via pve-devel 2025-05-22 6:30 ` Fabian Grünbichler 2025-05-22 6:55 ` Denis Kanchev via pve-devel [not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com> 2025-05-22 8:22 ` Fabian Grünbichler 2025-05-28 6:13 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com> 2025-05-28 6:33 ` Fabian Grünbichler 2025-05-29 7:33 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com> 2025-06-02 7:37 ` Fabian Grünbichler 2025-06-02 8:35 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com> 2025-06-02 8:49 ` Fabian Grünbichler 2025-06-02 9:18 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com> 2025-06-02 11:42 ` Fabian Grünbichler 2025-06-02 13:23 ` Denis Kanchev via pve-devel [not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com> 2025-06-02 14:31 ` Fabian Grünbichler 2025-06-04 12:52 ` Denis Kanchev via pve-devel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inboxService provided by Proxmox Server Solutions GmbH | Privacy | Legal