public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC qemu-server] clone disk: fix handling of snapshot-as-volume-chain for EFI disks
@ 2025-11-07 17:17 Friedrich Weber
  2025-11-10 17:06 ` Friedrich Weber
  0 siblings, 1 reply; 4+ messages in thread
From: Friedrich Weber @ 2025-11-07 17:17 UTC (permalink / raw)
  To: pve-devel

Currently, cloning an EFI disk off a snapshot-as-volume-chain snapshot
fails with "qemu-img: Failed to load snapshot: Can't find snapshot".
The reason is that the special case for EFI disks calls `qemu-img dd`
with the additional `-l snapname` option, which is only valid for
qcow2-internal snapshots. For snapshot-as-volume-chain snapshots, the
source volume is already the volume corresponding to the snapshot.

Fix this by checking whether the snapshot in question is an external
snapshot, and if it is, omitting the `-l` option.

Reported in enterprise support.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---

Notes:
    I noticed that two oddities about volume_snapshot_info:
    
    - it doesn't seem to work for qcow2-internal snapshots: It assumes
      that if `qemu img info` returns a JSON object, the snapshots are
      internal, and if it returns a JSON array, they are "external", but
      this doesn't seem correct -- I think because we pass
      --backing-chain, qemu-img will return an one-element JSON array even
      for internal snapshots. Hence, volume_snapshot_info incorrectly
      returns a hash ref with only a single member "current" if the
      snapshots are internal. But since I only found callsites guarded
      with snapshot-as-volume-chain, I guess this didn't hurt until now?
    
      It doesn't hurt for my patch either because it only checks for
      existence of $info->{$snapshot}->{ext} (which also works for
      qcow2-internal snapshots, because alreay $info->{$snapshot} will not
      exist then), but that's admittedly not nice and it's probably a good
      idea to fix volume_snapshot_info in addition.
    
    - but in which way should we fix it? Right now, for volume-chain
      snapshots, it seems like volume_snapshot_info will always have a
      member "current" pointing to the "actual" volume, even if the volume
      doesn't have any snapshots. Is this intended and do we want this
      also in case of qcow2-internal snapshots? From reading
      ZFSPoolPlugin::volume_snapshot_info, it looks like that one will
      return an empty hash (without "current") if the volume has no
      snapshots.

 src/PVE/QemuServer.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index cf195ccc..4cd7af52 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -7868,7 +7868,9 @@ sub clone_disk {
                 if ($src_format eq 'qcow2' && $snapname) {
                     die "cannot clone qcow2 EFI disk snapshot - requires QEMU >= 6.2\n"
                         if !min_version(kvm_user_version(), 6, 2);
-                    push $cmd->@*, '-l', $snapname;
+                    my $snapshot_info =
+                        PVE::Storage::volume_snapshot_info($storecfg, $drive->{file});
+                    push $cmd->@*, '-l', $snapname if !$snapshot_info->{$snapname}->{ext};
                 }
                 push $cmd->@*, "bs=$bs", "osize=$size", "if=$src_path", "of=$dst_path";
                 run_command($cmd);
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [RFC qemu-server] clone disk: fix handling of snapshot-as-volume-chain for EFI disks
  2025-11-07 17:17 [pve-devel] [RFC qemu-server] clone disk: fix handling of snapshot-as-volume-chain for EFI disks Friedrich Weber
@ 2025-11-10 17:06 ` Friedrich Weber
  2025-11-11  9:46   ` Fabian Grünbichler
  0 siblings, 1 reply; 4+ messages in thread
From: Friedrich Weber @ 2025-11-10 17:06 UTC (permalink / raw)
  To: pve-devel

On 07/11/2025 18:19, Friedrich Weber wrote:
> Currently, cloning an EFI disk off a snapshot-as-volume-chain snapshot
> fails with "qemu-img: Failed to load snapshot: Can't find snapshot".
> The reason is that the special case for EFI disks calls `qemu-img dd`
> with the additional `-l snapname` option, which is only valid for
> qcow2-internal snapshots. For snapshot-as-volume-chain snapshots, the
> source volume is already the volume corresponding to the snapshot.
> 
> Fix this by checking whether the snapshot in question is an external
> snapshot, and if it is, omitting the `-l` option.
> 
> Reported in enterprise support.

Turns out this only fixes the issue in case of snapshot-as-volume-chain
on file-level storages, not on LVM. The reason is that
LVMPlugin::volume_snapshot_info does not set the `ext` key to 1 [1], so
the code doesn't recognize the external snapshot.

LVMPlugin::volume_snapshot_info should probably set `ext` (assuming that
"ext" means "external")?

> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> 
> Notes:
>     I noticed that two oddities about volume_snapshot_info:
>     
>     - it doesn't seem to work for qcow2-internal snapshots: It assumes
>       that if `qemu img info` returns a JSON object, the snapshots are
>       internal, and if it returns a JSON array, they are "external", but
>       this doesn't seem correct -- I think because we pass
>       --backing-chain, qemu-img will return an one-element JSON array even
>       for internal snapshots. Hence, volume_snapshot_info incorrectly
>       returns a hash ref with only a single member "current" if the
>       snapshots are internal. But since I only found callsites guarded
>       with snapshot-as-volume-chain, I guess this didn't hurt until now?
>     
>       It doesn't hurt for my patch either because it only checks for
>       existence of $info->{$snapshot}->{ext} (which also works for
>       qcow2-internal snapshots, because alreay $info->{$snapshot} will not
>       exist then), but that's admittedly not nice and it's probably a good
>       idea to fix volume_snapshot_info in addition.
>     
>     - but in which way should we fix it? Right now, for volume-chain
>       snapshots, it seems like volume_snapshot_info will always have a
>       member "current" pointing to the "actual" volume, even if the volume
>       doesn't have any snapshots. Is this intended and do we want this
>       also in case of qcow2-internal snapshots? From reading
>       ZFSPoolPlugin::volume_snapshot_info, it looks like that one will
>       return an empty hash (without "current") if the volume has no
>       snapshots.

Looks like

- ZFSPoolPlugin::volume_snapshot_info indeed doesn't return "current"
- but there are several callsites of {LVM,}Plugin::volume_snapshot_info
that rely on 'current' being present for snapshot-as-volume chain [2]

So the implementations differ w.r.t. returning a "current" entry or not,
which is IMO a bit awkward API-wise. But fixing this in either way
(removing it for {LVM,}Plugin or adding it for ZFSPoolPlugin) would
probably require bigger changes in either the snapshot-as-volume-chain
or the ZFS replication code?

For fixing this particular bug, I guess we could instead call
`volume_qemu_snapshot_method` and only add the `-l snapname` parameter
if that returns 'qemu'. But I think this is only safe because we don't
allow enabling/disabling snapshot-as-volume-chain for file-level
storages on-the-fly.

What do you think?

[1]
https://git.proxmox.com/?p=pve-storage.git;a=blob;f=src/PVE/Storage/LVMPlugin.pm;h=3badfef2c;hb=a85ebe3#l849
[2]
https://git.proxmox.com/?p=qemu-server.git;a=blob;f=src/PVE/QemuServer/Blockdev.pm;h=8fa5eb51e;hb=b41c1e1440ff822#l420


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [RFC qemu-server] clone disk: fix handling of snapshot-as-volume-chain for EFI disks
  2025-11-10 17:06 ` Friedrich Weber
@ 2025-11-11  9:46   ` Fabian Grünbichler
  2025-11-11 12:40     ` [pve-devel] superseded: " Friedrich Weber
  0 siblings, 1 reply; 4+ messages in thread
From: Fabian Grünbichler @ 2025-11-11  9:46 UTC (permalink / raw)
  To: Proxmox VE development discussion

On November 10, 2025 6:06 pm, Friedrich Weber wrote:
> On 07/11/2025 18:19, Friedrich Weber wrote:
>> Currently, cloning an EFI disk off a snapshot-as-volume-chain snapshot
>> fails with "qemu-img: Failed to load snapshot: Can't find snapshot".
>> The reason is that the special case for EFI disks calls `qemu-img dd`
>> with the additional `-l snapname` option, which is only valid for
>> qcow2-internal snapshots. For snapshot-as-volume-chain snapshots, the
>> source volume is already the volume corresponding to the snapshot.
>> 
>> Fix this by checking whether the snapshot in question is an external
>> snapshot, and if it is, omitting the `-l` option.
>> 
>> Reported in enterprise support.
> 
> Turns out this only fixes the issue in case of snapshot-as-volume-chain
> on file-level storages, not on LVM. The reason is that
> LVMPlugin::volume_snapshot_info does not set the `ext` key to 1 [1], so
> the code doesn't recognize the external snapshot.
> 
> LVMPlugin::volume_snapshot_info should probably set `ext` (assuming that
> "ext" means "external")?
> 
>> 
>> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
>> ---
>> 
>> Notes:
>>     I noticed that two oddities about volume_snapshot_info:
>>     
>>     - it doesn't seem to work for qcow2-internal snapshots: It assumes
>>       that if `qemu img info` returns a JSON object, the snapshots are
>>       internal, and if it returns a JSON array, they are "external", but
>>       this doesn't seem correct -- I think because we pass
>>       --backing-chain, qemu-img will return an one-element JSON array even
>>       for internal snapshots. Hence, volume_snapshot_info incorrectly
>>       returns a hash ref with only a single member "current" if the
>>       snapshots are internal. But since I only found callsites guarded
>>       with snapshot-as-volume-chain, I guess this didn't hurt until now?
>>     
>>       It doesn't hurt for my patch either because it only checks for
>>       existence of $info->{$snapshot}->{ext} (which also works for
>>       qcow2-internal snapshots, because alreay $info->{$snapshot} will not
>>       exist then), but that's admittedly not nice and it's probably a good
>>       idea to fix volume_snapshot_info in addition.
>>     
>>     - but in which way should we fix it? Right now, for volume-chain
>>       snapshots, it seems like volume_snapshot_info will always have a
>>       member "current" pointing to the "actual" volume, even if the volume
>>       doesn't have any snapshots. Is this intended and do we want this
>>       also in case of qcow2-internal snapshots? From reading
>>       ZFSPoolPlugin::volume_snapshot_info, it looks like that one will
>>       return an empty hash (without "current") if the volume has no
>>       snapshots.
> 
> Looks like
> 
> - ZFSPoolPlugin::volume_snapshot_info indeed doesn't return "current"
> - but there are several callsites of {LVM,}Plugin::volume_snapshot_info
> that rely on 'current' being present for snapshot-as-volume chain [2]
> 
> So the implementations differ w.r.t. returning a "current" entry or not,
> which is IMO a bit awkward API-wise. But fixing this in either way
> (removing it for {LVM,}Plugin or adding it for ZFSPoolPlugin) would
> probably require bigger changes in either the snapshot-as-volume-chain
> or the ZFS replication code?
> 
> For fixing this particular bug, I guess we could instead call
> `volume_qemu_snapshot_method` and only add the `-l snapname` parameter
> if that returns 'qemu'. But I think this is only safe because we don't
> allow enabling/disabling snapshot-as-volume-chain for file-level
> storages on-the-fly.
> 
> What do you think?

that sounds like a good approach - changing snapshot-as-volume-chain
already breaks all sorts of things anyway, so that is not supported.

cleaning up the rest would be good as well..


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] superseded: [RFC qemu-server] clone disk: fix handling of snapshot-as-volume-chain for EFI disks
  2025-11-11  9:46   ` Fabian Grünbichler
@ 2025-11-11 12:40     ` Friedrich Weber
  0 siblings, 0 replies; 4+ messages in thread
From: Friedrich Weber @ 2025-11-11 12:40 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

Thanks for looking into this!

On 11/11/2025 10:46, Fabian Grünbichler wrote:
> On November 10, 2025 6:06 pm, Friedrich Weber wrote:
>> On 07/11/2025 18:19, Friedrich Weber wrote:
>>> Currently, cloning an EFI disk off a snapshot-as-volume-chain snapshot
>>> fails with "qemu-img: Failed to load snapshot: Can't find snapshot".
>>> The reason is that the special case for EFI disks calls `qemu-img dd`
>>> with the additional `-l snapname` option, which is only valid for
>>> qcow2-internal snapshots. For snapshot-as-volume-chain snapshots, the
>>> source volume is already the volume corresponding to the snapshot.
>>>
>>> Fix this by checking whether the snapshot in question is an external
>>> snapshot, and if it is, omitting the `-l` option.
>>>
>>> Reported in enterprise support.
>>
>> Turns out this only fixes the issue in case of snapshot-as-volume-chain
>> on file-level storages, not on LVM. The reason is that
>> LVMPlugin::volume_snapshot_info does not set the `ext` key to 1 [1], so
>> the code doesn't recognize the external snapshot.
>>
>> LVMPlugin::volume_snapshot_info should probably set `ext` (assuming that
>> "ext" means "external")?
>>
>>>
>>> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
>>> ---
>>>
>>> Notes:
>>>     I noticed that two oddities about volume_snapshot_info:
>>>     
>>>     - it doesn't seem to work for qcow2-internal snapshots: It assumes
>>>       that if `qemu img info` returns a JSON object, the snapshots are
>>>       internal, and if it returns a JSON array, they are "external", but
>>>       this doesn't seem correct -- I think because we pass
>>>       --backing-chain, qemu-img will return an one-element JSON array even
>>>       for internal snapshots. Hence, volume_snapshot_info incorrectly
>>>       returns a hash ref with only a single member "current" if the
>>>       snapshots are internal. But since I only found callsites guarded
>>>       with snapshot-as-volume-chain, I guess this didn't hurt until now?
>>>     
>>>       It doesn't hurt for my patch either because it only checks for
>>>       existence of $info->{$snapshot}->{ext} (which also works for
>>>       qcow2-internal snapshots, because alreay $info->{$snapshot} will not
>>>       exist then), but that's admittedly not nice and it's probably a good
>>>       idea to fix volume_snapshot_info in addition.
>>>     
>>>     - but in which way should we fix it? Right now, for volume-chain
>>>       snapshots, it seems like volume_snapshot_info will always have a
>>>       member "current" pointing to the "actual" volume, even if the volume
>>>       doesn't have any snapshots. Is this intended and do we want this
>>>       also in case of qcow2-internal snapshots? From reading
>>>       ZFSPoolPlugin::volume_snapshot_info, it looks like that one will
>>>       return an empty hash (without "current") if the volume has no
>>>       snapshots.
>>
>> Looks like
>>
>> - ZFSPoolPlugin::volume_snapshot_info indeed doesn't return "current"
>> - but there are several callsites of {LVM,}Plugin::volume_snapshot_info
>> that rely on 'current' being present for snapshot-as-volume chain [2]
>>
>> So the implementations differ w.r.t. returning a "current" entry or not,
>> which is IMO a bit awkward API-wise. But fixing this in either way
>> (removing it for {LVM,}Plugin or adding it for ZFSPoolPlugin) would
>> probably require bigger changes in either the snapshot-as-volume-chain
>> or the ZFS replication code?
>>
>> For fixing this particular bug, I guess we could instead call
>> `volume_qemu_snapshot_method` and only add the `-l snapname` parameter
>> if that returns 'qemu'. But I think this is only safe because we don't
>> allow enabling/disabling snapshot-as-volume-chain for file-level
>> storages on-the-fly.
>>
>> What do you think?
> 
> that sounds like a good approach - changing snapshot-as-volume-chain
> already breaks all sorts of things anyway, so that is not supported.
> 

Sounds good, I sent a v2:
https://lore.proxmox.com/pve-devel/20251111123222.73957-1-f.weber@proxmox.com/T/


> cleaning up the rest would be good as well..

Yes. I didn't have time to look into it more, but what sounds somewhat
sensible to me is to

- specify that volume_snapshot_info also returns a 'current' entry
- add 'current' entry to ZFSPoolPlugin implementation
- skip over 'current' entries in the replication callsites of
volume_snapshot_info


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-11-11 12:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-07 17:17 [pve-devel] [RFC qemu-server] clone disk: fix handling of snapshot-as-volume-chain for EFI disks Friedrich Weber
2025-11-10 17:06 ` Friedrich Weber
2025-11-11  9:46   ` Fabian Grünbichler
2025-11-11 12:40     ` [pve-devel] superseded: " Friedrich Weber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal