From: Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Cc: Denis Kanchev <denis.kanchev@storpool.com>,
Wolfgang Bumiller <w.bumiller@proxmox.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] PVE child process behavior question
Date: Mon, 2 Jun 2025 16:23:27 +0300 [thread overview]
Message-ID: <mailman.167.1748870653.395.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <2141074266.768.1748864577569@webmail.proxmox.com>
[-- Attachment #1: Type: message/rfc822, Size: 11746 bytes --]
From: Denis Kanchev <denis.kanchev@storpool.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com>
Subject: Re: [pve-devel] PVE child process behavior question
Date: Mon, 2 Jun 2025 16:23:27 +0300
Message-ID: <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>
We tend to prevent having a volume active on two nodes, as may lead to data
corruption, so we detach the volume from all nodes ( except the target one
) via our shared storage system.
In the sub activate_volume() our logic is to not detach the volume from
other hosts in case of migration - because activate_volume() can be called
in other cases, where detaching is necessary.
But in this case where the QM start process is killed, the migration is
marked as failed and still activate_volume() is called on the destination
host after migration_cancel ( we track the "lock" flag to be migrate ).
That's why i proposed the child processes to be killed when the parent one
dies - it will prevent such cases.
Not sure if passing an extra argument (marking it as migration) to
activate_volume() will solve such issue too.
Here is a trace log of activate_volume() in case of migration.
2025-05-02 13:03:28.2222 [2712103] took 0.0006: activate_volume: storeid
'autotest__ec2_1', scfg {'type' => 'storpool','shared' => 1,'template' =>
'autotest__ec2_1','extra-tags' => 'tier=high','content' => {'iso' =>
1,'images' => 1}}, volname 'vm-101-disk-0-sp-z.b.df.raw', exclusive undef
at /usr/share/perl5/PVE/St
orage/Custom/StorPoolPlugin.pm line 1551.
PVE::Storage::Custom::StorPoolPlugin::activate_volume("PVE::Storage::Custom::StorPoolPlugin",
"autotest__ec2_1", HASH(0x559cd06d88a0), "vm-101-disk-0-sp-z.b.df.raw",
undef, HASH(0x559cd076b9a8)) called at /usr/share/perl5/PVE/Storage.pm line
1309
PVE::Storage::activate_volumes(HASH(0x559cc99d04e0),
ARRAY(0x559cd0754558)) called at /usr/share/perl5/PVE/QemuServer.pm line
5823
PVE::QemuServer::vm_start_nolock(HASH(0x559cc99d04e0), 101,
HASH(0x559cd0730ca0), HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/QemuServer.pm line 5592
PVE::QemuServer::__ANON__() called at
/usr/share/perl5/PVE/AbstractConfig.pm line 299
PVE::AbstractConfig::__ANON__() called at
/usr/share/perl5/PVE/Tools.pm line 259
eval {...} called at /usr/share/perl5/PVE/Tools.pm line 259
PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf",
10, 0, CODE(0x559ccf14b968)) called at
/usr/share/perl5/PVE/AbstractConfig.pm line 302
PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
322
PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
330
PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/QemuServer.pm line
5593
PVE::QemuServer::vm_start(HASH(0x559cc99d04e0), 101,
HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3259
PVE::API2::Qemu::__ANON__("UPID:lab-dk-2:00296227:0ADF72E0:683DA11F:qmstart:101:root\@pam:")
called at /usr/share/perl5/PVE/RESTEnvironment.pm line 620
eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line
611
PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x559cc99d0558),
"qmstart", 101, "root\@pam", CODE(0x559cd06cc160)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3263
PVE::API2::Qemu::__ANON__(HASH(0x559cd0700df8)) called at
/usr/share/perl5/PVE/RESTHandler.pm line 499
PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x559cd05deb98),
HASH(0x559cd0700df8), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line
985
eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 968
PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start",
"vm_start", ARRAY(0x559cc99cfee0), ARRAY(0x559cd0745e98),
HASH(0x559cd0745ef8), CODE(0x559cd07091f8), undef) called at
/usr/share/perl5/PVE/CLIHandler.pm line 594
PVE::CLIHandler::__ANON__(ARRAY(0x559cc99d00c0), undef,
CODE(0x559cd07091f8)) called at /usr/share/perl5/PVE/CLIHandler.pm line 673
PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at
/usr/sbin/qm line 8
On Mon, Jun 2, 2025 at 2:42 PM Fabian Grünbichler <
f.gruenbichler@proxmox.com> wrote:
>
> > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST
> geschrieben:
> >
> >
> > My bad :) in terms of Proxmox it must be hand-overing the storage
> control - the storage plugin function activate_volume() is called in our
> case, which moves the storage to the new VM.
> > So no data is moved across the nodes and only the volumes get
> re-attached.
> > Thanks for the plentiful information
>
> okay!
>
> so you basically special case this "volume is active on two nodes" case
> which should only happen during a live migration, and that somehow runs
> into an issue if the migration is aborted because there is some suspected
> race somewhere?
>
> as part of a live migration, the sequence should be:
>
> node A: migration starts
> node A: start request for target VM on node B (over SSH)
> node B: `qm start ..` is called
> node B: qm start will activate volumes
> node B: qm start returns
> node A: migration starts
> node A/B: some fatal error
> node A: cancel migration (via QMP/the source VM running on node A)
> node A: request to stop target VM on node B (over SSH)
> node B: `qm stop ..` called
> node B: qm stop will deactivate volumes
>
> I am not sure where another activate_volume call after node A has started
> the migration could happen? at that point, node A still has control over
> the VM (ID), so nothing in PVE should operate on it other than the
> selective calls made as part of the migration, which are basically only
> querying migration status and error handling at that point..
>
> it would still be good to know what actually got OOM-killed in your case..
> was it the `qm start`? was it the `kvm` process itself? something entirely
> else?
>
> if you can reproduce the issue, you could also add logging in
> activate_volume to find out the exact call path (e.g., log the call stack
> somewhere), maybe that helps find the exact scenario that you are seeing..
>
>
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-06-02 13:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 13:13 Denis Kanchev via pve-devel
2025-05-22 6:30 ` Fabian Grünbichler
2025-05-22 6:55 ` Denis Kanchev via pve-devel
[not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
2025-05-22 8:22 ` Fabian Grünbichler
2025-05-28 6:13 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>
2025-05-28 6:33 ` Fabian Grünbichler
2025-05-29 7:33 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>
2025-06-02 7:37 ` Fabian Grünbichler
2025-06-02 8:35 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>
2025-06-02 8:49 ` Fabian Grünbichler
2025-06-02 9:18 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
2025-06-02 11:42 ` Fabian Grünbichler
2025-06-02 13:23 ` Denis Kanchev via pve-devel [this message]
[not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>
2025-06-02 14:31 ` Fabian Grünbichler
2025-06-04 12:52 ` Denis Kanchev via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mailman.167.1748870653.395.pve-devel@lists.proxmox.com \
--to=pve-devel@lists.proxmox.com \
--cc=denis.kanchev@storpool.com \
--cc=f.gruenbichler@proxmox.com \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal