public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] Consistency in volume deletion in process of concurrent VM deletion
@ 2025-10-21 15:33 Andrei Perepiolkin via pve-devel
  2025-10-22  9:49 ` Fabian Grünbichler
  0 siblings, 1 reply; 4+ messages in thread
From: Andrei Perepiolkin via pve-devel @ 2025-10-21 15:33 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: Andrei Perepiolkin

[-- Attachment #1: Type: message/rfc822, Size: 8933 bytes --]

From: Andrei Perepiolkin <andrei.perepiolkin@open-e.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: [pve-devel] Consistency in volume deletion in process of concurrent VM deletion
Date: Tue, 21 Oct 2025 11:33:27 -0400
Message-ID: <7cf85c82-28d9-4883-9826-39e60bfa3450@open-e.com>

Hi Proxmox Community,


There might be a potential consistency problem with Proxmox vm deletion.

If Proxmox receives multiple concurrent VM deletion requests, where each 
VM has multiple disks located on shared storage.

The deletion process may fail or hang when attempting to acquire the 
storage 
lock(https://github.com/proxmox/pve-storage/blob/master/src/PVE/Storage.pm#L1196C1-L1209C7).

...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
trying to acquire cfs lock 'storage-jdss-Pool-2' ...
cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
...

Eventually, the VM configuration files in /etc/pve are removed, but some 
VM disks may remain.

Additionally, the Web UI shows all deletions as successful, even though 
some disks were not deleted.

In my opinion, a VM should either be deleted completely—including all 
dependent resources—or the deletion should fail, leaving the VM 
configuration file with an updated state.



Im reproducing this by:

     for i in `seq 401 420` ; do  qm clone 104 $i --name "win-$i" --full 
--storage jdss-Pool-2 ; done;

     for i in `seq 401 410` ; do  qm destroy $i 
--destroy-unreferenced-disks 1 --purge 1 &  done ;


Have to notice that ssh session that I use to conduct 'qm destroy' 
command get terminated by Proxmox.

Ive duplicated as a bug at: 
https://bugzilla.proxmox.com/show_bug.cgi?id=6957


Is this a bug and will it be addressed in near future?


Best regards,

Andrei Perepiolkin



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] Consistency in volume deletion in process of concurrent VM deletion
  2025-10-21 15:33 [pve-devel] Consistency in volume deletion in process of concurrent VM deletion Andrei Perepiolkin via pve-devel
@ 2025-10-22  9:49 ` Fabian Grünbichler
  2025-10-22 14:38   ` Andrei Perepiolkin via pve-devel
       [not found]   ` <e14b6374-9460-4655-8bd5-55bd90245919@open-e.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Fabian Grünbichler @ 2025-10-22  9:49 UTC (permalink / raw)
  To: Proxmox VE development discussion

On October 21, 2025 5:33 pm, Andrei Perepiolkin via pve-devel wrote:
> Hi Proxmox Community,
> 
> There might be a potential consistency problem with Proxmox vm deletion.
> 
> If Proxmox receives multiple concurrent VM deletion requests, where each 
> VM has multiple disks located on shared storage.
> 
> The deletion process may fail or hang when attempting to acquire the 
> storage 
> lock(https://github.com/proxmox/pve-storage/blob/master/src/PVE/Storage.pm#L1196C1-L1209C7).
> 
> ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
> ...
> 
> Eventually, the VM configuration files in /etc/pve are removed, but some 
> VM disks may remain.
> 
> Additionally, the Web UI shows all deletions as successful, even though 
> some disks were not deleted.
> 
> In my opinion, a VM should either be deleted completely—including all 
> dependent resources—or the deletion should fail, leaving the VM 
> configuration file with an updated state.

the underlying issue is that the scope of the lock taken for certain
storage operations is very big for shared storages. we could probably
reduce it to a more meaningful level for most such storages:

https://bugzilla.proxmox.com/show_bug.cgi?id=1962

but the the error handling might also be lacking in this case, would
have to double-check.

> 
> Im reproducing this by:
> 
>      for i in `seq 401 420` ; do  qm clone 104 $i --name "win-$i" --full 
> --storage jdss-Pool-2 ; done;
> 
>      for i in `seq 401 410` ; do  qm destroy $i 
> --destroy-unreferenced-disks 1 --purge 1 &  done ;
> 
> 
> Have to notice that ssh session that I use to conduct 'qm destroy' 
> command get terminated by Proxmox.

that seems unexpected, are you sure this is caused by PVE?

> Ive duplicated as a bug at: 
> https://bugzilla.proxmox.com/show_bug.cgi?id=6957

it would be better to either send a mail or file a bug, to not risk
splitting the discussion..

> Is this a bug and will it be addressed in near future?

nobody picked up the work regarding the lock granularity, but it would
be a nice improvement IMHO!

Fabian


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] Consistency in volume deletion in process of concurrent VM deletion
  2025-10-22  9:49 ` Fabian Grünbichler
@ 2025-10-22 14:38   ` Andrei Perepiolkin via pve-devel
       [not found]   ` <e14b6374-9460-4655-8bd5-55bd90245919@open-e.com>
  1 sibling, 0 replies; 4+ messages in thread
From: Andrei Perepiolkin via pve-devel @ 2025-10-22 14:38 UTC (permalink / raw)
  To: Fabian Grünbichler, Proxmox VE development discussion
  Cc: Andrei Perepiolkin

[-- Attachment #1: Type: message/rfc822, Size: 9512 bytes --]

From: Andrei Perepiolkin <andrei.perepiolkin@open-e.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] Consistency in volume deletion in process of concurrent VM deletion
Date: Wed, 22 Oct 2025 10:38:45 -0400
Message-ID: <e14b6374-9460-4655-8bd5-55bd90245919@open-e.com>

Hi Fabian,


I can try to prototype some proof-of-concept solution for 'lock 
granularity'.
Once it is done, the issue of ssh session termination should become clear.

Im new to mail-based contribution and Proxmox code itself.
So I will probably have questions on various topics.

Should I send this questions via email, as messages in bugzila or via 
other tool?


Best regards,

Andrei Perepiolkin


On 10/22/25 05:49, Fabian Grünbichler wrote:
> On October 21, 2025 5:33 pm, Andrei Perepiolkin via pve-devel wrote:
>> Hi Proxmox Community,
>>
>> There might be a potential consistency problem with Proxmox vm deletion.
>>
>> If Proxmox receives multiple concurrent VM deletion requests, where each
>> VM has multiple disks located on shared storage.
>>
>> The deletion process may fail or hang when attempting to acquire the
>> storage
>> lock(https://github.com/proxmox/pve-storage/blob/master/src/PVE/Storage.pm#L1196C1-L1209C7).
>>
>> ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
>> ...
>>
>> Eventually, the VM configuration files in /etc/pve are removed, but some
>> VM disks may remain.
>>
>> Additionally, the Web UI shows all deletions as successful, even though
>> some disks were not deleted.
>>
>> In my opinion, a VM should either be deleted completely—including all
>> dependent resources—or the deletion should fail, leaving the VM
>> configuration file with an updated state.
> the underlying issue is that the scope of the lock taken for certain
> storage operations is very big for shared storages. we could probably
> reduce it to a more meaningful level for most such storages:
>
> https://bugzilla.proxmox.com/show_bug.cgi?id=1962
>
> but the the error handling might also be lacking in this case, would
> have to double-check.
>
>> Im reproducing this by:
>>
>>       for i in `seq 401 420` ; do  qm clone 104 $i --name "win-$i" --full
>> --storage jdss-Pool-2 ; done;
>>
>>       for i in `seq 401 410` ; do  qm destroy $i
>> --destroy-unreferenced-disks 1 --purge 1 &  done ;
>>
>>
>> Have to notice that ssh session that I use to conduct 'qm destroy'
>> command get terminated by Proxmox.
> that seems unexpected, are you sure this is caused by PVE?
>
>> Ive duplicated as a bug at:
>> https://bugzilla.proxmox.com/show_bug.cgi?id=6957
> it would be better to either send a mail or file a bug, to not risk
> splitting the discussion..
>
>> Is this a bug and will it be addressed in near future?
> nobody picked up the work regarding the lock granularity, but it would
> be a nice improvement IMHO!
>
> Fabian
>


[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] Consistency in volume deletion in process of concurrent VM deletion
       [not found]   ` <e14b6374-9460-4655-8bd5-55bd90245919@open-e.com>
@ 2025-10-23  7:00     ` Fabian Grünbichler
  0 siblings, 0 replies; 4+ messages in thread
From: Fabian Grünbichler @ 2025-10-23  7:00 UTC (permalink / raw)
  To: Andrei Perepiolkin, Proxmox VE development discussion

On October 22, 2025 4:38 pm, Andrei Perepiolkin wrote:
> Hi Fabian,
> 
> 
> I can try to prototype some proof-of-concept solution for 'lock 
> granularity'.

see https://pve.proxmox.com/wiki/Developer_Documentation for details of
how to submit patches (in particular also "Software License and
Copyright").

> Once it is done, the issue of ssh session termination should become clear.

it would be interesting, because right now I don't really see how a `qm`
invocation should kill the SSH session it is running in - it definitely
should not happen!

> Im new to mail-based contribution and Proxmox code itself.
> So I will probably have questions on various topics.
> 
> Should I send this questions via email, as messages in bugzila or via 
> other tool?

questions regarding patch development (both the workflow, and the patch
contents) are probably best discussed here on the list. feel free to
continue this thread, unless it is a very generic question.

> On 10/22/25 05:49, Fabian Grünbichler wrote:
>> On October 21, 2025 5:33 pm, Andrei Perepiolkin via pve-devel wrote:
>>> Hi Proxmox Community,
>>>
>>> There might be a potential consistency problem with Proxmox vm deletion.
>>>
>>> If Proxmox receives multiple concurrent VM deletion requests, where each
>>> VM has multiple disks located on shared storage.
>>>
>>> The deletion process may fail or hang when attempting to acquire the
>>> storage
>>> lock(https://github.com/proxmox/pve-storage/blob/master/src/PVE/Storage.pm#L1196C1-L1209C7).
>>>
>>> ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> trying to acquire cfs lock 'storage-jdss-Pool-2' ...
>>> cfs-lock 'storage-jdss-Pool-2' error: got lock request timeout
>>> ...
>>>
>>> Eventually, the VM configuration files in /etc/pve are removed, but some
>>> VM disks may remain.
>>>
>>> Additionally, the Web UI shows all deletions as successful, even though
>>> some disks were not deleted.
>>>
>>> In my opinion, a VM should either be deleted completely—including all
>>> dependent resources—or the deletion should fail, leaving the VM
>>> configuration file with an updated state.
>> the underlying issue is that the scope of the lock taken for certain
>> storage operations is very big for shared storages. we could probably
>> reduce it to a more meaningful level for most such storages:
>>
>> https://bugzilla.proxmox.com/show_bug.cgi?id=1962
>>
>> but the the error handling might also be lacking in this case, would
>> have to double-check.
>>
>>> Im reproducing this by:
>>>
>>>       for i in `seq 401 420` ; do  qm clone 104 $i --name "win-$i" --full
>>> --storage jdss-Pool-2 ; done;
>>>
>>>       for i in `seq 401 410` ; do  qm destroy $i
>>> --destroy-unreferenced-disks 1 --purge 1 &  done ;
>>>
>>>
>>> Have to notice that ssh session that I use to conduct 'qm destroy'
>>> command get terminated by Proxmox.
>> that seems unexpected, are you sure this is caused by PVE?
>>
>>> Ive duplicated as a bug at:
>>> https://bugzilla.proxmox.com/show_bug.cgi?id=6957
>> it would be better to either send a mail or file a bug, to not risk
>> splitting the discussion..
>>
>>> Is this a bug and will it be addressed in near future?
>> nobody picked up the work regarding the lock granularity, but it would
>> be a nice improvement IMHO!
>>
>> Fabian
>>
> 
> 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-23  7:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-21 15:33 [pve-devel] Consistency in volume deletion in process of concurrent VM deletion Andrei Perepiolkin via pve-devel
2025-10-22  9:49 ` Fabian Grünbichler
2025-10-22 14:38   ` Andrei Perepiolkin via pve-devel
     [not found]   ` <e14b6374-9460-4655-8bd5-55bd90245919@open-e.com>
2025-10-23  7:00     ` Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal