From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Denis Kanchev <denis.kanchev@storpool.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Subject: Re: [pve-devel] PVE child process behavior question
Date: Thu, 22 May 2025 10:22:17 +0200 (CEST) [thread overview]
Message-ID: <1349127939.17705.1747902137180@webmail.proxmox.com> (raw)
In-Reply-To: <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
> Denis Kanchev <denis.kanchev@storpool.com> hat am 22.05.2025 08:55 CEST geschrieben:
>
>
> The parent of the storage migration process gets killed.
>
> It seems that this is the desired behavior and as far i understand it
> correctly - the child worker is detached from the parent and it has
> nothing to do with it after spawning.
was this a remote migration or a regular migration? could you maybe
post the full task log?
for a regular migration, the storage migration just uses our
"run_command" helper. run_command uses open3 to spawn the command, and
select for command output handling.
basically the process tree would look like this
API worker (one of X in pvedaemon)
-> task worker (executing the migration code)
--> storage migration command (xxx | ssh target_node xxx)
and it does seem like run_command doesn't properly forward the parent being
killed/terminated:
$ perl -e 'use strict; use warnings; use PVE::Tools; warn "parent pid: $$\n"; PVE::Tools::run_command([["bash", "-c", "sleep 10; sleep 20; echo after > /tmp/file"]]);'
parent pid: 204620
[1] 204618 terminated sudo perl -e
(sending SIGTERM from another shell to 204620). the bash command continues
executing, and also writes to /tmp/file after the sleeps are finished..
the same is also true for SIGKILL. SIGINT properly cleans up the child
though.
@Wolfgang: is this desired behaviour?
>
> Thanks for the information, it was very helpful.
>
> On 22.05.25 г. 9:30 ч., Fabian Grünbichler wrote:
> >> Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com> hat am 21.05.2025 15:13 CEST geschrieben:
> >> Hello,
> >>
> >> We had an issue with a customer migrating a VM between nodes using our
> >> shared storage solution.
> >>
> >> On the target host the OOM killer killed the main migration process, but
> >> the child process (which actually performs the migration) kept on
> >> working, which we did not expect, and that caused some issues.
> > could you be more specific which process got killed?
> >
> > when you do a migration, a task worker is forked and its UPID is returned
> > to the caller for further querying.
> >
> > as part of the migration, other processes get spawned:
> > - ssh tunnel to the target node
> > - storage migration processes (on both nodes)
> > - VM state management CLI calls (on the target node)
> >
> > which of those is the "main migration process"? which is the child process?
> >
> >> This leads us to the broader question - after a request is submitted,
> >> the parent can be terminated, and not return a response to the client,
> >> while the work is being done, and the request can be wrongly retried or
> >> considered unfinished.
> > the parent should return almost immediately, as all it is doing at that
> > point is returning the UPID to the client (the process then continues to
> > do other work though, but that is no longer related to this task).
> >
> > the only exception is for "sync" task workers, like in a CLI context,
> > where the "parent" has no other work to do, so it waits for the child/task
> > to finish and prints its output while doing so, and some "bulk action"
> > style API calls that fork multiple task workers and poll them themselves.
> >
> >> Should the child processes terminate together with the parent to guard
> >> against this, or is this expected behavior?
> > the parent (API worker process) and child (task worker process) have no
> > direct relation after the task worker has been spawned.
> >
> >> Here is an example patch to do this:
> >>
> >>
> >> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
> >>
> >> index bfde7e6..744fffc 100644
> >>
> >> --- a/src/PVE/RESTEnvironment.pm
> >>
> >> +++ b/src/PVE/RESTEnvironment.pm
> >>
> >> @@ -13,8 +13,9 @@ use Fcntl qw(:flock);
> >>
> >> use IO::File;
> >>
> >> use IO::Handle;
> >>
> >> use IO::Select;
> >>
> >> -use POSIX qw(:sys_wait_h EINTR);
> >>
> >> +use POSIX qw(:sys_wait_h EINTR SIGKILL);
> >>
> >> use AnyEvent;
> >>
> >> +use Linux::Prctl qw(set_pdeathsig);
> >>
> >>
> >> use PVE::Exception qw(raise raise_perm_exc);
> >>
> >> use PVE::INotify;
> >>
> >> @@ -549,6 +550,9 @@ sub fork_worker {
> >>
> >> POSIX::setsid();
> >>
> >> }
> >>
> >>
> >> + # The signal that the calling process will get when its parent dies
> >>
> >> + set_pdeathsig(SIGKILL);
> > that has weird implications with regards to threads, so I don't think that
> > is a good idea..
> >
> >> +
> >>
> >> POSIX::close ($psync[0]);
> >>
> >> POSIX::close ($ctrlfd[0]) if $sync;
> >>
> >> POSIX::close ($csync[1]);
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-05-22 8:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 13:13 Denis Kanchev via pve-devel
2025-05-22 6:30 ` Fabian Grünbichler
2025-05-22 6:55 ` Denis Kanchev via pve-devel
[not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
2025-05-22 8:22 ` Fabian Grünbichler [this message]
2025-05-28 6:13 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>
2025-05-28 6:33 ` Fabian Grünbichler
2025-05-29 7:33 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>
2025-06-02 7:37 ` Fabian Grünbichler
2025-06-02 8:35 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>
2025-06-02 8:49 ` Fabian Grünbichler
2025-06-02 9:18 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
2025-06-02 11:42 ` Fabian Grünbichler
2025-06-02 13:23 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>
2025-06-02 14:31 ` Fabian Grünbichler
2025-06-04 12:52 ` Denis Kanchev via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1349127939.17705.1747902137180@webmail.proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=denis.kanchev@storpool.com \
--cc=pve-devel@lists.proxmox.com \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal