From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Denis Kanchev <denis.kanchev@storpool.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Subject: Re: [pve-devel] PVE child process behavior question
Date: Thu, 22 May 2025 10:22:17 +0200 (CEST) [thread overview]
Message-ID: <1349127939.17705.1747902137180@webmail.proxmox.com> (raw)
In-Reply-To: <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
> Denis Kanchev <denis.kanchev@storpool.com> hat am 22.05.2025 08:55 CEST geschrieben:
>
>
> The parent of the storage migration process gets killed.
>
> It seems that this is the desired behavior and as far i understand it
> correctly - the child worker is detached from the parent and it has
> nothing to do with it after spawning.
was this a remote migration or a regular migration? could you maybe
post the full task log?
for a regular migration, the storage migration just uses our
"run_command" helper. run_command uses open3 to spawn the command, and
select for command output handling.
basically the process tree would look like this
API worker (one of X in pvedaemon)
-> task worker (executing the migration code)
--> storage migration command (xxx | ssh target_node xxx)
and it does seem like run_command doesn't properly forward the parent being
killed/terminated:
$ perl -e 'use strict; use warnings; use PVE::Tools; warn "parent pid: $$\n"; PVE::Tools::run_command([["bash", "-c", "sleep 10; sleep 20; echo after > /tmp/file"]]);'
parent pid: 204620
[1] 204618 terminated sudo perl -e
(sending SIGTERM from another shell to 204620). the bash command continues
executing, and also writes to /tmp/file after the sleeps are finished..
the same is also true for SIGKILL. SIGINT properly cleans up the child
though.
@Wolfgang: is this desired behaviour?
>
> Thanks for the information, it was very helpful.
>
> On 22.05.25 г. 9:30 ч., Fabian Grünbichler wrote:
> >> Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com> hat am 21.05.2025 15:13 CEST geschrieben:
> >> Hello,
> >>
> >> We had an issue with a customer migrating a VM between nodes using our
> >> shared storage solution.
> >>
> >> On the target host the OOM killer killed the main migration process, but
> >> the child process (which actually performs the migration) kept on
> >> working, which we did not expect, and that caused some issues.
> > could you be more specific which process got killed?
> >
> > when you do a migration, a task worker is forked and its UPID is returned
> > to the caller for further querying.
> >
> > as part of the migration, other processes get spawned:
> > - ssh tunnel to the target node
> > - storage migration processes (on both nodes)
> > - VM state management CLI calls (on the target node)
> >
> > which of those is the "main migration process"? which is the child process?
> >
> >> This leads us to the broader question - after a request is submitted,
> >> the parent can be terminated, and not return a response to the client,
> >> while the work is being done, and the request can be wrongly retried or
> >> considered unfinished.
> > the parent should return almost immediately, as all it is doing at that
> > point is returning the UPID to the client (the process then continues to
> > do other work though, but that is no longer related to this task).
> >
> > the only exception is for "sync" task workers, like in a CLI context,
> > where the "parent" has no other work to do, so it waits for the child/task
> > to finish and prints its output while doing so, and some "bulk action"
> > style API calls that fork multiple task workers and poll them themselves.
> >
> >> Should the child processes terminate together with the parent to guard
> >> against this, or is this expected behavior?
> > the parent (API worker process) and child (task worker process) have no
> > direct relation after the task worker has been spawned.
> >
> >> Here is an example patch to do this:
> >>
> >>
> >> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
> >>
> >> index bfde7e6..744fffc 100644
> >>
> >> --- a/src/PVE/RESTEnvironment.pm
> >>
> >> +++ b/src/PVE/RESTEnvironment.pm
> >>
> >> @@ -13,8 +13,9 @@ use Fcntl qw(:flock);
> >>
> >> use IO::File;
> >>
> >> use IO::Handle;
> >>
> >> use IO::Select;
> >>
> >> -use POSIX qw(:sys_wait_h EINTR);
> >>
> >> +use POSIX qw(:sys_wait_h EINTR SIGKILL);
> >>
> >> use AnyEvent;
> >>
> >> +use Linux::Prctl qw(set_pdeathsig);
> >>
> >>
> >> use PVE::Exception qw(raise raise_perm_exc);
> >>
> >> use PVE::INotify;
> >>
> >> @@ -549,6 +550,9 @@ sub fork_worker {
> >>
> >> POSIX::setsid();
> >>
> >> }
> >>
> >>
> >> + # The signal that the calling process will get when its parent dies
> >>
> >> + set_pdeathsig(SIGKILL);
> > that has weird implications with regards to threads, so I don't think that
> > is a good idea..
> >
> >> +
> >>
> >> POSIX::close ($psync[0]);
> >>
> >> POSIX::close ($ctrlfd[0]) if $sync;
> >>
> >> POSIX::close ($csync[1]);
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-05-22 8:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 13:13 Denis Kanchev via pve-devel
2025-05-22 6:30 ` Fabian Grünbichler
2025-05-22 6:55 ` Denis Kanchev via pve-devel
[not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
2025-05-22 8:22 ` Fabian Grünbichler [this message]
2025-05-28 6:13 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>
2025-05-28 6:33 ` Fabian Grünbichler
2025-05-29 7:33 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>
2025-06-02 7:37 ` Fabian Grünbichler
2025-06-02 8:35 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>
2025-06-02 8:49 ` Fabian Grünbichler
2025-06-02 9:18 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
2025-06-02 11:42 ` Fabian Grünbichler
2025-06-02 13:23 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>
2025-06-02 14:31 ` Fabian Grünbichler
2025-06-04 12:52 ` Denis Kanchev via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1349127939.17705.1747902137180@webmail.proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=denis.kanchev@storpool.com \
--cc=pve-devel@lists.proxmox.com \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.