[pve-devel] Volume live migration concurrency

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] Volume live migration concurrency
@ 2025-05-26 14:31 Andrei Perapiolkin via pve-devel
  2025-05-27  7:02 ` Fabian Grünbichler
  0 siblings, 1 reply; 6+ messages in thread
From: Andrei Perapiolkin via pve-devel @ 2025-05-26 14:31 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: Andrei Perapiolkin

[-- Attachment #1: Type: message/rfc822, Size: 7095 bytes --]

From: Andrei Perapiolkin <andrei.perepiolkin@open-e.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Volume live migration concurrency
Date: Mon, 26 May 2025 10:31:20 -0400
Message-ID: <6f0ab1f6-2d3f-41e3-8750-2567eb5b947d@open-e.com>

Hi Proxmox Community,

I'm curious whether there are any standard or guidelines that govern the 
order in which the methods: /activate_volume, deactivate_volume, path/ 
are called during VM live migration.

Assuming the storage plugin supports `live migration`:

1. Can/path/ be called before /activate_volume?/

2. When /vm /migrates from/node1/ to/node2, /might /activate_volume/ 
on/node2/ be invoked before /deactivate_volume/ has completed on /node1?
/

3. In the context of live migration: Will Proxmox skip calling 
/deactivate_volume/ for snapshots that have already been activated? 
Should the storage plugin explicitly deactivate all snapshots of a 
volume during migration?

Best regards,

Andrei Perepiolkin

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] Volume live migration concurrency
  2025-05-26 14:31 [pve-devel] Volume live migration concurrency Andrei Perapiolkin via pve-devel
@ 2025-05-27  7:02 ` Fabian Grünbichler
  2025-05-27 16:08   ` Andrei Perapiolkin via pve-devel
       [not found]   ` <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2025-05-27  7:02 UTC (permalink / raw)
  To: Proxmox VE development discussion


> Andrei Perapiolkin via pve-devel <pve-devel@lists.proxmox.com> hat am 26.05.2025 16:31 CEST geschrieben:

> Hi Proxmox Community,
> 
> I'm curious whether there are any standard or guidelines that govern the 
> order in which the methods: /activate_volume, deactivate_volume, path/ 
> are called during VM live migration.
> 
> Assuming the storage plugin supports `live migration`:
> 
> 1. Can/path/ be called before /activate_volume?/

yes

> 2. When /vm /migrates from/node1/ to/node2, /might /activate_volume/ 
> on/node2/ be invoked before /deactivate_volume/ has completed on /node1?
> /

it has to be, for a live migration both the source VM and the target VM need
access to the volume. the migration ensures that only either copy/node is
writing to a shared volume at any given time. for a local volume, the volumes
are independent anyway.

> 3. In the context of live migration: Will Proxmox skip calling 
> /deactivate_volume/ for snapshots that have already been activated? 
> Should the storage plugin explicitly deactivate all snapshots of a 
> volume during migration?

a live migration is not concerned with snapshots of shared volumes, and local
volumes are removed on the source node after the migration has finished..

but maybe you could expand this part?


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] Volume live migration concurrency
  2025-05-27  7:02 ` Fabian Grünbichler
@ 2025-05-27 16:08   ` Andrei Perapiolkin via pve-devel
       [not found]   ` <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Andrei Perapiolkin via pve-devel @ 2025-05-27 16:08 UTC (permalink / raw)
  To: Fabian Grünbichler, Proxmox VE development discussion
  Cc: Andrei Perapiolkin

[-- Attachment #1: Type: message/rfc822, Size: 11284 bytes --]

From: Andrei Perapiolkin <andrei.perepiolkin@open-e.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] Volume live migration concurrency
Date: Tue, 27 May 2025 12:08:37 -0400
Message-ID: <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com>

> 3. In the context of live migration: Will Proxmox skip calling
> /deactivate_volume/ for snapshots that have already been activated?
> Should the storage plugin explicitly deactivate all snapshots of a
> volume during migration?

> a live migration is not concerned with snapshots of shared volumes, and local
> volumes are removed on the source node after the migration has finished..
>
> but maybe you could expand this part?

My original idea was that since both 'activate_volume' and 
'deactivate_volume' methods have a 'snapname' argument they would both 
be used to activate and deactivate snapshots respectivly.
And for each snapshot activation, there would be a corresponding 
deactivation.
However, from observing the behavior during migration, I found that 
'deactivate_volume' is not called for snapshots that were previously 
activated with 'activate_volume'.
Therefore, I assumed that 'deactivate_volume' is responsible for 
deactivating all snapshots related to the volume that was previously 
activated.
The purpose if this question was to confirm this.

 From your response I conclude the following:
1. Migration does not manages(i.e. it does not activate or deactivate 
them  volume snapshots.
2. All volumes are expected to be present across all nodes in cluster, 
for 'path' function to work.
3. For migration to work volume should be simultaneously present on both 
nodes.

However, I couldn't find explicit instructions or guides on when and by 
whom volume snapshot deactivation should be triggered.

Is it possible for a volume snapshot to remain active active after 
volume itself was deactivated?

During testing proxmox 8.2 Ive encountered situations when cloning a 
volume from a snapshot did not resulted in snapshot deactivation.
This leads to the creation of 'dangling' snapshots if  the volume is 
later migrated.

My current understanding is that all assets related to snapshots should 
to be removed when volume is deactivation, is it correct?
Or all volumes and snapshots expected to be present across the entire 
cluster until they are explicitly deleted?

Second option requires additional recommendation on artifact management.
May be it should be sent it as an separate email, but draft it here.

If all volumes and snapshots are consistently present across entire 
cluster and their creation/operation results in creation of additional 
artifacts(such as iSCSI targets, multipath sessions, etc..), then this 
artifacts should be removed on deletion of associated volume or snapshot.
Currently, it is unclear how all nodes in the cluster are notified of 
such deletion as only one node in the cluster receives 'free_image' or 
'volume_snapshot_delete'  request.
What is a proper way to instruct plugin on other nodes in the cluster 
that given volume/snapshot is requested for deletion and all artifacts 
related to it have to be removed?

How should the cleanup tasks be triggered across the remaining nodes?

I assume that additional service/daemon would be needed to handle such 
such tasks.
In that case, could it leverage the Proxmox Cluster File System 
(pmxcfs), special the '/etc/pve/priv' directory to coordinate or store 
state information related to cleanup operations?

Andrei

On 5/27/25 03:02, Fabian Grünbichler wrote:
>> Andrei Perapiolkin via pve-devel <pve-devel@lists.proxmox.com> hat am 26.05.2025 16:31 CEST geschrieben:
>> Hi Proxmox Community,
>>
>> I'm curious whether there are any standard or guidelines that govern the
>> order in which the methods: /activate_volume, deactivate_volume, path/
>> are called during VM live migration.
>>
>> Assuming the storage plugin supports `live migration`:
>>
>> 1. Can/path/ be called before /activate_volume?/
> yes
>
>> 2. When /vm /migrates from/node1/ to/node2, /might /activate_volume/
>> on/node2/ be invoked before /deactivate_volume/ has completed on /node1?
>> /
> it has to be, for a live migration both the source VM and the target VM need
> access to the volume. the migration ensures that only either copy/node is
> writing to a shared volume at any given time. for a local volume, the volumes
> are independent anyway.
>
>> 3. In the context of live migration: Will Proxmox skip calling
>> /deactivate_volume/ for snapshots that have already been activated?
>> Should the storage plugin explicitly deactivate all snapshots of a
>> volume during migration?
> a live migration is not concerned with snapshots of shared volumes, and local
> volumes are removed on the source node after the migration has finished..
>
> but maybe you could expand this part?
>

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com>]

* Re: [pve-devel] Volume live migration concurrency
       [not found]   ` <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com>
@ 2025-05-28  7:06     ` Fabian Grünbichler
  2025-05-28 14:49       ` Andrei Perapiolkin via pve-devel
       [not found]       ` <9d2b0cf4-7037-491c-b4a4-81538e63376d@open-e.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2025-05-28  7:06 UTC (permalink / raw)
  To: Andrei Perapiolkin, Proxmox VE development discussion

> Andrei Perapiolkin <andrei.perepiolkin@open-e.com> hat am 27.05.2025 18:08 CEST geschrieben:
> 
>  
> > 3. In the context of live migration: Will Proxmox skip calling
> > /deactivate_volume/ for snapshots that have already been activated?
> > Should the storage plugin explicitly deactivate all snapshots of a
> > volume during migration?
> 
> > a live migration is not concerned with snapshots of shared volumes, and local
> > volumes are removed on the source node after the migration has finished..
> >
> > but maybe you could expand this part?
> 
> My original idea was that since both 'activate_volume' and 
> 'deactivate_volume' methods have a 'snapname' argument they would both 
> be used to activate and deactivate snapshots respectivly.
> And for each snapshot activation, there would be a corresponding 
> deactivation.

deactivating volumes (and snapshots) is a lot trickier than activating
them, because you might have multiple readers in parallel that we don't
know about.

so if you have the following pattern

activate
do something
deactivate

and two instances of that are interleaved:

A: activate
B: activate
A: do something
A: deactivate
B: do something -> FAILURE, volume not active

you have a problem.

that's why we deactivate in special circumstances:
- as part of error handling for freshly activated volumes
- as part of migration when finally stopping the source VM or before
  freeing local source volumes
- ..

where we can be reasonably sure that no other user exists, or it is
required for safety purposes.

otherwise, we'd need to do refcounting on volume activations and have
some way to hook that for external users, to avoid premature deactivation.

> However, from observing the behavior during migration, I found that 
> 'deactivate_volume' is not called for snapshots that were previously 
> activated with 'activate_volume'.

where they activated for the migration? or for cloning from a snapshot?
or ..?

maybe there is call path that should deactivate that snapshot after using
it..

> Therefore, I assumed that 'deactivate_volume' is responsible for 
> deactivating all snapshots related to the volume that was previously 
> activated.
> The purpose if this question was to confirm this.
> 
>  From your response I conclude the following:
> 1. Migration does not manages(i.e. it does not activate or deactivate 
> them  volume snapshots.

that really depends. a storage migration might activate a snapshot if
that is required for transferring the volume. this mostly applies to
offline migration or unused volumes though, and only for some storages.

> 2. All volumes are expected to be present across all nodes in cluster, 
> for 'path' function to work.

if at all possible, path should just do a "logical" conversion of volume ID
to a stable/deterministic path, or the information required for Qemu to
access the volume if no path exists. ideally, this means it works without
activating the volume, but it might require querying the storage.

> 3. For migration to work volume should be simultaneously present on both 
> nodes.

for a live migration and shared storage, yes. for an offline migration with
shared storage, the VM is never started on the target node, so no volume
activation is required until that happens later. for local storages, volumes
only exist on one node anyway (they are copied during the migration).

> However, I couldn't find explicit instructions or guides on when and by 
> whom volume snapshot deactivation should be triggered.

yes, this is a bit under-specified unfortunately. we are currently working
on improving the documentation (and the storage plugin API).

> Is it possible for a volume snapshot to remain active active after 
> volume itself was deactivated?

I'd have to check all the code paths to give an answer to that.
snapshots are rarely activated in general - IIRC mostly for
- cloning from a snapshot
- replication (limited to ZFS at the moment)
- storage migration

so just did that:
- cloning from a snapshot only deactivates if the clone is to a different
  node, for both VM and CT -> see below
- CT backup in snapshot mode deletes the snapshot which implies deactivation
- storage_migrate (move_disk or offline migration) if a snapshot is passed,
  IIRC this only affects ZFS, which doesn't do activation anyway

> During testing proxmox 8.2 Ive encountered situations when cloning a 
> volume from a snapshot did not resulted in snapshot deactivation.
> This leads to the creation of 'dangling' snapshots if  the volume is 
> later migrated.

ah, that probably answers my question above.

I think this might be one of those cases where deactivation is hard - you
can have multiple clones from the same source VM running in parallel, and
only the last one would be allowed to deactivate the snapshot/volume..

> My current understanding is that all assets related to snapshots should 
> to be removed when volume is deactivation, is it correct?
> Or all volumes and snapshots expected to be present across the entire 
> cluster until they are explicitly deleted?

I am not quite sure what you mean by "present" - do you mean "exist in an
activated state"?

> Second option requires additional recommendation on artifact management.
> May be it should be sent it as an separate email, but draft it here.
> 
> If all volumes and snapshots are consistently present across entire 
> cluster and their creation/operation results in creation of additional 
> artifacts(such as iSCSI targets, multipath sessions, etc..), then this 
> artifacts should be removed on deletion of associated volume or snapshot.
> Currently, it is unclear how all nodes in the cluster are notified of 
> such deletion as only one node in the cluster receives 'free_image' or 
> 'volume_snapshot_delete'  request.
> What is a proper way to instruct plugin on other nodes in the cluster 
> that given volume/snapshot is requested for deletion and all artifacts 
> related to it have to be removed?

I now get where you are coming from I think! a volume should only be active
on a single node, except during a live migration, where the source node
will always get a deactivation call at the end.

deactivating a volume should also tear down related, volume-specific
resources, if applicable.

> How should the cleanup tasks be triggered across the remaining nodes?

it should not be needed, but I think you've found an edge case where we
need to improve.

I think our RBD plugin is also affected by this, all the other plugins
either:
- don't support snapshots (or cloning from them)
- are local only
- don't need any special activation/deactivation

I think the safe approach is likely to deactivate all snapshots when
deactivating the volume itself, for now.

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] Volume live migration concurrency
  2025-05-28  7:06     ` Fabian Grünbichler
@ 2025-05-28 14:49       ` Andrei Perapiolkin via pve-devel
       [not found]       ` <9d2b0cf4-7037-491c-b4a4-81538e63376d@open-e.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Andrei Perapiolkin via pve-devel @ 2025-05-28 14:49 UTC (permalink / raw)
  To: Fabian Grünbichler, Proxmox VE development discussion
  Cc: Andrei Perapiolkin

[-- Attachment #1: Type: message/rfc822, Size: 16398 bytes --]

From: Andrei Perapiolkin <andrei.perepiolkin@open-e.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] Volume live migration concurrency
Date: Wed, 28 May 2025 10:49:43 -0400
Message-ID: <9d2b0cf4-7037-491c-b4a4-81538e63376d@open-e.com>

Hi Fabian,

Thank you for your time dedicated to this issue.

>> My current understanding is that all assets related to snapshots should 
>> to be removed when volume is deactivation, is it correct?
>> Or all volumes and snapshots expected to be present across the entire 
>> cluster until they are explicitly deleted?

> I am not quite sure what you mean by "present" - do you mean "exist in an
activated state"?

Exists in an active state - activated.


>> How should the cleanup tasks be triggered across the remaining nodes?

> it should not be needed 

Consider following scenarios of live migration of a VM from 'node1' to 
'node2':

1. Error occurs on 'node2' resulting in partial activation
2. Error occurs on 'node1' resulting in partial deactivation
3. Error occurs on both 'node1' and 'node2' resulting in dangling 
artifacts remain on both 'node1' and 'node2'

That might lead to partial activation(some artefacts might be created) 
and partial deactivation(some artifacts might remain uncleared).
Now, suppose the user unlocks the VM (if it was previously locked due to 
the failure) and proceeds with another migration attempt, this time to 
'node3', hoping for success.
What would happen to the artifacts on 'node1' and 'node2' in such a case?


Regarding 'path' function

In my case it is difficult to deterministically predict actual path of 
the device.
Determining this path essentially requires activating the volume.
This approach is questionable, as it implies calling activate_volume 
without Proxmox being aware that the activation has occurred.
What would happen if a failure occurs within Proxmox before it reaches 
the stage of officially activating the volume?

Additionaly I believe that providing 'physical path' of the resource 
that is not yet present(i.e. activated and usable) is a questionable 
practice.
This creates a risk, as there is always a temptation to use the path 
directly, under the assumption that the resource is ready.

This approach assumes that all developers are fully aware that a given 
$path might merely be a placeholder, and that additional activation is 
required before use.
The issue becomes even more complex in larger code base that integrate 
third-party software—such as QEMU.

I might be mistaken, but during my experiments with the 'path' function, 
I encountered an error where the virtualization system failed to open a 
volume that had not been fully activated.
Perhaps this has been addressed in newer versions, but previously, there 
appeared to be a race condition between volume activation and QEMU 
attempting to operate on the expected block device path.


Andrei

On 5/28/25 03:06, Fabian Grünbichler wrote:
>> Andrei Perapiolkin <andrei.perepiolkin@open-e.com> hat am 27.05.2025 18:08 CEST geschrieben:
>>
>>   
>>> 3. In the context of live migration: Will Proxmox skip calling
>>> /deactivate_volume/ for snapshots that have already been activated?
>>> Should the storage plugin explicitly deactivate all snapshots of a
>>> volume during migration?
>>> a live migration is not concerned with snapshots of shared volumes, and local
>>> volumes are removed on the source node after the migration has finished..
>>>
>>> but maybe you could expand this part?
>> My original idea was that since both 'activate_volume' and
>> 'deactivate_volume' methods have a 'snapname' argument they would both
>> be used to activate and deactivate snapshots respectivly.
>> And for each snapshot activation, there would be a corresponding
>> deactivation.
> deactivating volumes (and snapshots) is a lot trickier than activating
> them, because you might have multiple readers in parallel that we don't
> know about.
>
> so if you have the following pattern
>
> activate
> do something
> deactivate
>
> and two instances of that are interleaved:
>
> A: activate
> B: activate
> A: do something
> A: deactivate
> B: do something -> FAILURE, volume not active
>
> you have a problem.
>
> that's why we deactivate in special circumstances:
> - as part of error handling for freshly activated volumes
> - as part of migration when finally stopping the source VM or before
>    freeing local source volumes
> - ..
>
> where we can be reasonably sure that no other user exists, or it is
> required for safety purposes.
>
> otherwise, we'd need to do refcounting on volume activations and have
> some way to hook that for external users, to avoid premature deactivation.
>
>> However, from observing the behavior during migration, I found that
>> 'deactivate_volume' is not called for snapshots that were previously
>> activated with 'activate_volume'.
> where they activated for the migration? or for cloning from a snapshot?
> or ..?
>
> maybe there is call path that should deactivate that snapshot after using
> it..
>
>> Therefore, I assumed that 'deactivate_volume' is responsible for
>> deactivating all snapshots related to the volume that was previously
>> activated.
>> The purpose if this question was to confirm this.
>>
>>   From your response I conclude the following:
>> 1. Migration does not manages(i.e. it does not activate or deactivate
>> them  volume snapshots.
> that really depends. a storage migration might activate a snapshot if
> that is required for transferring the volume. this mostly applies to
> offline migration or unused volumes though, and only for some storages.
>
>> 2. All volumes are expected to be present across all nodes in cluster,
>> for 'path' function to work.
> if at all possible, path should just do a "logical" conversion of volume ID
> to a stable/deterministic path, or the information required for Qemu to
> access the volume if no path exists. ideally, this means it works without
> activating the volume, but it might require querying the storage.
>
>> 3. For migration to work volume should be simultaneously present on both
>> nodes.
> for a live migration and shared storage, yes. for an offline migration with
> shared storage, the VM is never started on the target node, so no volume
> activation is required until that happens later. for local storages, volumes
> only exist on one node anyway (they are copied during the migration).
>
>> However, I couldn't find explicit instructions or guides on when and by
>> whom volume snapshot deactivation should be triggered.
> yes, this is a bit under-specified unfortunately. we are currently working
> on improving the documentation (and the storage plugin API).
>
>> Is it possible for a volume snapshot to remain active active after
>> volume itself was deactivated?
> I'd have to check all the code paths to give an answer to that.
> snapshots are rarely activated in general - IIRC mostly for
> - cloning from a snapshot
> - replication (limited to ZFS at the moment)
> - storage migration
>
> so just did that:
> - cloning from a snapshot only deactivates if the clone is to a different
>    node, for both VM and CT -> see below
> - CT backup in snapshot mode deletes the snapshot which implies deactivation
> - storage_migrate (move_disk or offline migration) if a snapshot is passed,
>    IIRC this only affects ZFS, which doesn't do activation anyway
>
>> During testing proxmox 8.2 Ive encountered situations when cloning a
>> volume from a snapshot did not resulted in snapshot deactivation.
>> This leads to the creation of 'dangling' snapshots if  the volume is
>> later migrated.
> ah, that probably answers my question above.
>
> I think this might be one of those cases where deactivation is hard - you
> can have multiple clones from the same source VM running in parallel, and
> only the last one would be allowed to deactivate the snapshot/volume..
>
>> My current understanding is that all assets related to snapshots should
>> to be removed when volume is deactivation, is it correct?
>> Or all volumes and snapshots expected to be present across the entire
>> cluster until they are explicitly deleted?
> I am not quite sure what you mean by "present" - do you mean "exist in an
> activated state"?
>
>> Second option requires additional recommendation on artifact management.
>> May be it should be sent it as an separate email, but draft it here.
>>
>> If all volumes and snapshots are consistently present across entire
>> cluster and their creation/operation results in creation of additional
>> artifacts(such as iSCSI targets, multipath sessions, etc..), then this
>> artifacts should be removed on deletion of associated volume or snapshot.
>> Currently, it is unclear how all nodes in the cluster are notified of
>> such deletion as only one node in the cluster receives 'free_image' or
>> 'volume_snapshot_delete'  request.
>> What is a proper way to instruct plugin on other nodes in the cluster
>> that given volume/snapshot is requested for deletion and all artifacts
>> related to it have to be removed?
> I now get where you are coming from I think! a volume should only be active
> on a single node, except during a live migration, where the source node
> will always get a deactivation call at the end.
>
> deactivating a volume should also tear down related, volume-specific
> resources, if applicable.
>
>> How should the cleanup tasks be triggered across the remaining nodes?
> it should not be needed, but I think you've found an edge case where we
> need to improve.
>
> I think our RBD plugin is also affected by this, all the other plugins
> either:
> - don't support snapshots (or cloning from them)
> - are local only
> - don't need any special activation/deactivation
>
> I think the safe approach is likely to deactivate all snapshots when
> deactivating the volume itself, for now.
>


[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <9d2b0cf4-7037-491c-b4a4-81538e63376d@open-e.com>]

* Re: [pve-devel] Volume live migration concurrency
       [not found]       ` <9d2b0cf4-7037-491c-b4a4-81538e63376d@open-e.com>
@ 2025-06-02  7:32         ` Fabian Grünbichler
  0 siblings, 0 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2025-06-02  7:32 UTC (permalink / raw)
  To: Andrei Perapiolkin, Proxmox VE development discussion


> Andrei Perapiolkin <andrei.perepiolkin@open-e.com> hat am 28.05.2025 16:49 CEST geschrieben:
> 
>  
> Hi Fabian,
> 
> Thank you for your time dedicated to this issue.
> 
> >> My current understanding is that all assets related to snapshots should 
> >> to be removed when volume is deactivation, is it correct?
> >> Or all volumes and snapshots expected to be present across the entire 
> >> cluster until they are explicitly deleted?
> 
> > I am not quite sure what you mean by "present" - do you mean "exist in an
> activated state"?
> 
> Exists in an active state - activated.
> 
> 
> >> How should the cleanup tasks be triggered across the remaining nodes?
> 
> > it should not be needed 
> 
> Consider following scenarios of live migration of a VM from 'node1' to 
> 'node2':
> 
> 1. Error occurs on 'node2' resulting in partial activation

if an error occurs on the target node during phase2 (after the VM has been
started), the target VM will be stopped and any local disks allocated as
part of the migration will be cleaned up as well. stopping the VM
includes deactiving all its volumes.

> 2. Error occurs on 'node1' resulting in partial deactivation

you mean an error right between deactivating volume 1 and 2, when
control has already been handed over to node 2?

> 3. Error occurs on both 'node1' and 'node2' resulting in dangling 
> artifacts remain on both 'node1' and 'node2'

incomplete or partial error handling is of course always possible - some
kind of errors are hard or impossible to recover from, after all.

> That might lead to partial activation(some artefacts might be created) 
> and partial deactivation(some artifacts might remain uncleared).
> Now, suppose the user unlocks the VM (if it was previously locked due to 
> the failure) and proceeds with another migration attempt, this time to 
> 'node3', hoping for success.
> What would happen to the artifacts on 'node1' and 'node2' in such a case?

those on node2 would be unaffected (the new migration task doesn't know 
about the previous one). so you might have orphaned disks there in case of
local storage, or still activated shared volumes in case of shared storage.

on node1 everything should be handled correctly.

> Regarding 'path' function
> 
> In my case it is difficult to deterministically predict actual path of 
> the device.
> Determining this path essentially requires activating the volume.
> This approach is questionable, as it implies calling activate_volume 
> without Proxmox being aware that the activation has occurred.
> What would happen if a failure occurs within Proxmox before it reaches 
> the stage of officially activating the volume?

we treat activating a volume as idempotent, so this should not cause any
damage, unless you activate volumes outside of a migration on nodes that
are not currently "owning" that guest. your storage plugin is allowed
to activate volumes internally if needed.

but given that path() is called quite often, you'd have to ensure that
activating a volume is not too expensive (usually some kind of fast path
that is effectively a nop if the volume has already been activated before
is used).

> Additionaly I believe that providing 'physical path' of the resource 
> that is not yet present(i.e. activated and usable) is a questionable 
> practice.
> This creates a risk, as there is always a temptation to use the path 
> directly, under the assumption that the resource is ready.

yes, but it has advantages as well:
- we don't have to carry the path through a call stack, but can just
  retrieve it where needed without the extra cost of doing another
  activation
- path also does other things/serves other purposes which don't require
  activation at all

> This approach assumes that all developers are fully aware that a given 
> $path might merely be a placeholder, and that additional activation is 
> required before use.
> The issue becomes even more complex in larger code base that integrate 
> third-party software—such as QEMU.
> 
> I might be mistaken, but during my experiments with the 'path' function, 
> I encountered an error where the virtualization system failed to open a 
> volume that had not been fully activated.
> Perhaps this has been addressed in newer versions, but previously, there 
> appeared to be a race condition between volume activation and QEMU 
> attempting to operate on the expected block device path.

bugs are always possible, if you can find more details about what happened
there I'd be happy to take a look.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-02  7:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-26 14:31 [pve-devel] Volume live migration concurrency Andrei Perapiolkin via pve-devel
2025-05-27  7:02 ` Fabian Grünbichler
2025-05-27 16:08   ` Andrei Perapiolkin via pve-devel
     [not found]   ` <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com>
2025-05-28  7:06     ` Fabian Grünbichler
2025-05-28 14:49       ` Andrei Perapiolkin via pve-devel
     [not found]       ` <9d2b0cf4-7037-491c-b4a4-81538e63376d@open-e.com>
2025-06-02  7:32         ` Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal