From: Denis Kanchev via pve-devel <pve-devel@lists.proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Cc: Denis Kanchev <denis.kanchev@storpool.com>,
Wolfgang Bumiller <w.bumiller@proxmox.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] PVE child process behavior question
Date: Mon, 2 Jun 2025 16:23:27 +0300 [thread overview]
Message-ID: <mailman.167.1748870653.395.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <2141074266.768.1748864577569@webmail.proxmox.com>
[-- Attachment #1: Type: message/rfc822, Size: 11746 bytes --]
From: Denis Kanchev <denis.kanchev@storpool.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Wolfgang Bumiller <w.bumiller@proxmox.com>
Subject: Re: [pve-devel] PVE child process behavior question
Date: Mon, 2 Jun 2025 16:23:27 +0300
Message-ID: <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>
We tend to prevent having a volume active on two nodes, as may lead to data
corruption, so we detach the volume from all nodes ( except the target one
) via our shared storage system.
In the sub activate_volume() our logic is to not detach the volume from
other hosts in case of migration - because activate_volume() can be called
in other cases, where detaching is necessary.
But in this case where the QM start process is killed, the migration is
marked as failed and still activate_volume() is called on the destination
host after migration_cancel ( we track the "lock" flag to be migrate ).
That's why i proposed the child processes to be killed when the parent one
dies - it will prevent such cases.
Not sure if passing an extra argument (marking it as migration) to
activate_volume() will solve such issue too.
Here is a trace log of activate_volume() in case of migration.
2025-05-02 13:03:28.2222 [2712103] took 0.0006: activate_volume: storeid
'autotest__ec2_1', scfg {'type' => 'storpool','shared' => 1,'template' =>
'autotest__ec2_1','extra-tags' => 'tier=high','content' => {'iso' =>
1,'images' => 1}}, volname 'vm-101-disk-0-sp-z.b.df.raw', exclusive undef
at /usr/share/perl5/PVE/St
orage/Custom/StorPoolPlugin.pm line 1551.
PVE::Storage::Custom::StorPoolPlugin::activate_volume("PVE::Storage::Custom::StorPoolPlugin",
"autotest__ec2_1", HASH(0x559cd06d88a0), "vm-101-disk-0-sp-z.b.df.raw",
undef, HASH(0x559cd076b9a8)) called at /usr/share/perl5/PVE/Storage.pm line
1309
PVE::Storage::activate_volumes(HASH(0x559cc99d04e0),
ARRAY(0x559cd0754558)) called at /usr/share/perl5/PVE/QemuServer.pm line
5823
PVE::QemuServer::vm_start_nolock(HASH(0x559cc99d04e0), 101,
HASH(0x559cd0730ca0), HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/QemuServer.pm line 5592
PVE::QemuServer::__ANON__() called at
/usr/share/perl5/PVE/AbstractConfig.pm line 299
PVE::AbstractConfig::__ANON__() called at
/usr/share/perl5/PVE/Tools.pm line 259
eval {...} called at /usr/share/perl5/PVE/Tools.pm line 259
PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf",
10, 0, CODE(0x559ccf14b968)) called at
/usr/share/perl5/PVE/AbstractConfig.pm line 302
PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
322
PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
330
PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/QemuServer.pm line
5593
PVE::QemuServer::vm_start(HASH(0x559cc99d04e0), 101,
HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3259
PVE::API2::Qemu::__ANON__("UPID:lab-dk-2:00296227:0ADF72E0:683DA11F:qmstart:101:root\@pam:")
called at /usr/share/perl5/PVE/RESTEnvironment.pm line 620
eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line
611
PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x559cc99d0558),
"qmstart", 101, "root\@pam", CODE(0x559cd06cc160)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3263
PVE::API2::Qemu::__ANON__(HASH(0x559cd0700df8)) called at
/usr/share/perl5/PVE/RESTHandler.pm line 499
PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x559cd05deb98),
HASH(0x559cd0700df8), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line
985
eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 968
PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start",
"vm_start", ARRAY(0x559cc99cfee0), ARRAY(0x559cd0745e98),
HASH(0x559cd0745ef8), CODE(0x559cd07091f8), undef) called at
/usr/share/perl5/PVE/CLIHandler.pm line 594
PVE::CLIHandler::__ANON__(ARRAY(0x559cc99d00c0), undef,
CODE(0x559cd07091f8)) called at /usr/share/perl5/PVE/CLIHandler.pm line 673
PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at
/usr/sbin/qm line 8
On Mon, Jun 2, 2025 at 2:42 PM Fabian Grünbichler <
f.gruenbichler@proxmox.com> wrote:
>
> > Denis Kanchev <denis.kanchev@storpool.com> hat am 02.06.2025 11:18 CEST
> geschrieben:
> >
> >
> > My bad :) in terms of Proxmox it must be hand-overing the storage
> control - the storage plugin function activate_volume() is called in our
> case, which moves the storage to the new VM.
> > So no data is moved across the nodes and only the volumes get
> re-attached.
> > Thanks for the plentiful information
>
> okay!
>
> so you basically special case this "volume is active on two nodes" case
> which should only happen during a live migration, and that somehow runs
> into an issue if the migration is aborted because there is some suspected
> race somewhere?
>
> as part of a live migration, the sequence should be:
>
> node A: migration starts
> node A: start request for target VM on node B (over SSH)
> node B: `qm start ..` is called
> node B: qm start will activate volumes
> node B: qm start returns
> node A: migration starts
> node A/B: some fatal error
> node A: cancel migration (via QMP/the source VM running on node A)
> node A: request to stop target VM on node B (over SSH)
> node B: `qm stop ..` called
> node B: qm stop will deactivate volumes
>
> I am not sure where another activate_volume call after node A has started
> the migration could happen? at that point, node A still has control over
> the VM (ID), so nothing in PVE should operate on it other than the
> selective calls made as part of the migration, which are basically only
> querying migration status and error handling at that point..
>
> it would still be good to know what actually got OOM-killed in your case..
> was it the `qm start`? was it the `kvm` process itself? something entirely
> else?
>
> if you can reproduce the issue, you could also add logging in
> activate_volume to find out the exact call path (e.g., log the call stack
> somewhere), maybe that helps find the exact scenario that you are seeing..
>
>
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-06-02 13:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 13:13 Denis Kanchev via pve-devel
2025-05-22 6:30 ` Fabian Grünbichler
2025-05-22 6:55 ` Denis Kanchev via pve-devel
[not found] ` <857cbd6c-6866-417d-a71f-f5b5297bf09c@storpool.com>
2025-05-22 8:22 ` Fabian Grünbichler
2025-05-28 6:13 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzuk7tYRJV_j=88RWc3R3C7AkiEdFUXi88m5qwnDeYDEC+A@mail.gmail.com>
2025-05-28 6:33 ` Fabian Grünbichler
2025-05-29 7:33 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzumXeyJQQCj+45Hmy5qdU+BTFBYbHVgPy0u3VS-qS=_bDQ@mail.gmail.com>
2025-06-02 7:37 ` Fabian Grünbichler
2025-06-02 8:35 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzukAMG9050Ynn-KRSqhCz2Y0m6vnAQ7FEkCmEdQT3HapfQ@mail.gmail.com>
2025-06-02 8:49 ` Fabian Grünbichler
2025-06-02 9:18 ` Denis Kanchev via pve-devel
[not found] ` <CAHXTzu=AiNx0iTWFEUU2kdzx9-RopwLc7rqGui6f0Q=+Hy52=w@mail.gmail.com>
2025-06-02 11:42 ` Fabian Grünbichler
2025-06-02 13:23 ` Denis Kanchev via pve-devel [this message]
[not found] ` <CAHXTzu=qrZe2eEZro7qteR=fDjJQX13syfB9fs5VfFbG7Vy6vQ@mail.gmail.com>
2025-06-02 14:31 ` Fabian Grünbichler
2025-06-04 12:52 ` Denis Kanchev via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mailman.167.1748870653.395.pve-devel@lists.proxmox.com \
--to=pve-devel@lists.proxmox.com \
--cc=denis.kanchev@storpool.com \
--cc=f.gruenbichler@proxmox.com \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal