From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id EEA4C1FF348 for ; Wed, 17 Apr 2024 15:39:19 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 340407EEB; Wed, 17 Apr 2024 15:39:18 +0200 (CEST) Date: Wed, 17 Apr 2024 15:39:10 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Fiona Ebner , Proxmox VE development discussion References: <20240416131909.2867605-1-d.csapak@proxmox.com> <20240416131909.2867605-4-d.csapak@proxmox.com> <3b7027dc-7dd6-4a1f-bfda-7e21d476df0a@proxmox.com> In-Reply-To: <3b7027dc-7dd6-4a1f-bfda-7e21d476df0a@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1713360754.0ry1mvqt1q.astroid@yuna.none> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.055 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [dmtf.org, plugin.pm, storage.pm, ovf.pm, status.pm, proxmox.com, dirplugin.pm] Subject: Re: [pve-devel] [PATCH storage 3/9] plugin: dir: handle ova files for import X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" On April 17, 2024 3:07 pm, Dominik Csapak wrote: > On 4/17/24 12:52, Fiona Ebner wrote: >> Am 16.04.24 um 15:18 schrieb Dominik Csapak: >>> since we want to handle ova files (which are only ovf+vmdks bundled in a >>> tar file) for import, add code that handles that. >>> >>> we introduce a valid volname for files contained in ovas like this: >>> >>> storage:import/archive.ova/disk-1.vmdk >>> >>> by basically treating the last part of the path as the name for the >>> contained disk we want. >>> >>> we then provide 3 functions to use for that: >>> >>> * copy_needs_extraction: determines from the given volid (like above) if >>> that needs extraction to copy it, currently only 'import' vtype + >>> defined format returns true here (if we have more options in the >>> future, we can of course easily extend that) >>> >>> * extract_disk_from_import_file: this actually extracts the file from >>> the archive. Currently only ova is supported, so the extraction with >>> 'tar' is hardcoded, but again we can easily extend/modify that should >>> we need to. >>> >>> we currently extract into the import storage in a directory named: >>> `.tmp__` which should not clash with concurrent >>> operations (though we do extract it multiple times then) >>> >> >> Could we do "extract upon upload", "tar upon download" instead? Sure >> some people surely want to drop the ova manually, but we could tell them >> they need to extract it first too. Depending on the amount of headache >> this would save us, it might be worth it. > > we could, but this opens a whole other can of worms, namely > what to do with conflicting filenames for different ovas? > > we'd then either have to magically match the paths from the ovfs > to some subdir that don't overlap we could just use the ova name as dir name, and never store the ova under that name but use some tmp placeholder for that ;) > > or we'd have to abort everytime we encounter identical disk names > > IMHO this would be less practical than just extract on demand... > >> >>> alternatively we could implement either a 'tmpstorage' parameter, >>> or use e.g. '/var/tmp/' or similar, but re-using the current storage >>> seemed ok. >>> >>> * cleanup_extracted_image: intended to cleanup the extracted images from >>> above, including the surrounding temporary directory >>> >>> we have to modify the `parse_ovf` a bit to handle the missing disk >>> images, and we parse the size out of the ovf part (since this is >>> informal only, it should be no problem if we cannot parse it sometimes) >>> >>> Signed-off-by: Dominik Csapak >>> --- >>> src/PVE/API2/Storage/Status.pm | 1 + >>> src/PVE/Storage.pm | 59 ++++++++++++++++++++++++++++++++++ >>> src/PVE/Storage/DirPlugin.pm | 13 +++++++- >>> src/PVE/Storage/OVF.pm | 53 ++++++++++++++++++++++++++---- >>> src/PVE/Storage/Plugin.pm | 5 +++ >>> 5 files changed, 123 insertions(+), 8 deletions(-) >>> >>> diff --git a/src/PVE/API2/Storage/Status.pm b/src/PVE/API2/Storage/Status.pm >>> index f7e324f..77ed57c 100644 >>> --- a/src/PVE/API2/Storage/Status.pm >>> +++ b/src/PVE/API2/Storage/Status.pm >>> @@ -749,6 +749,7 @@ __PACKAGE__->register_method({ >>> 'efi-state-lost', >>> 'guest-is-running', >>> 'nvme-unsupported', >>> + 'ova-needs-extracting', >>> 'ovmf-with-lsi-unsupported', >>> 'serial-port-socket-only', >>> ], >>> diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm >>> index f8ea93d..bc073ef 100755 >>> --- a/src/PVE/Storage.pm >>> +++ b/src/PVE/Storage.pm >>> @@ -2189,4 +2189,63 @@ sub get_import_metadata { >>> return $plugin->get_import_metadata($scfg, $volname, $storeid); >>> } >>> >> >> Shouldn't the following three functions call into plugin methods >> instead? That'd seem much more future-proof to me. > > could be, i just did not want to extend the plugin api for that > but as fabian wrote, maybe we should put them in qemu-server > altogether for now? > > (after thinking about it a bit, i'd be in favor of putting it in > qemu-server, because mainly i don't want to add to the plugin api further) > > what do you think @fiona @fabian? another alternative would be to put them into the non-storage-plugin OVF helper module? >>> +sub copy_needs_extraction { >>> + my ($volid) = @_; >>> + my ($storeid, $volname) = parse_volume_id($volid); >>> + my $cfg = config(); >>> + my $scfg = storage_config($cfg, $storeid); >>> + my $plugin = PVE::Storage::Plugin->lookup($scfg->{type}); >>> + >>> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $file_format) = >>> + $plugin->parse_volname($volname); >>> + >>> + return $vtype eq 'import' && defined($file_format); >> >> E.g this seems rather hacky, and puts a weird coupling on a future >> import plugin's parse_volname() function (presence of $file_format). > > would it be better to check the volid again for '.ova/something$' ? > or do you have a better idea? > (especially if we want to have this maybe in qemu-server) hmm, could parse_volname return 'ova' as format? or 'ova+vmdk'? we don't actually need the format for extracting, and afterwards we get it from the extract file name anyway? >>> +} >>> + >>> +sub extract_disk_from_import_file { >>> + my ($volid, $vmid) = @_; >>> + >>> + my ($storeid, $volname) = parse_volume_id($volid); >>> + my $cfg = config(); >>> + my $scfg = storage_config($cfg, $storeid); >>> + my $plugin = PVE::Storage::Plugin->lookup($scfg->{type}); >>> + >>> + my ($vtype, $name, undef, undef, undef, undef, $file_format) = >>> + $plugin->parse_volname($volname); >>> + >>> + die "only files with content type 'import' can be extracted\n" >>> + if $vtype ne 'import' || !defined($file_format); >>> + >>> + # extract the inner file from the name >>> + if ($volid =~ m!${name}/([^/]+)$!) { >>> + $name = $1; >>> + } >>> + >>> + my ($source_file) = $plugin->path($scfg, $volname, $storeid); >>> + >>> + my $destdir = $plugin->get_subdir($scfg, 'import'); >>> + my $pid = $$; >>> + $destdir .= "/.tmp_${pid}_${vmid}"; >>> + mkdir $destdir; >>> + >>> + ($source_file) = $source_file =~ m|^(/.*)|; # untaint >>> + >>> + run_command(['tar', '-x', '-C', $destdir, '-f', $source_file, $name]); >>> + >>> + return "$destdir/$name"; >>> +} >>> + >>> +sub cleanup_extracted_image { >>> + my ($source) = @_; >>> + >>> + if ($source =~ m|^(/.+/\.tmp_[0-9]+_[0-9]+)/[^/]+$|) { >>> + my $tmpdir = $1; >>> + >>> + unlink $source or $! == ENOENT or die "removing image $source failed: $!\n"; >>> + rmdir $tmpdir or $! == ENOENT or die "removing tmpdir $tmpdir failed: $!\n"; >>> + } else { >>> + die "invalid extraced image path '$source'\n"; >>> + } >>> +} >>> + >>> 1; >>> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm >>> index 4dc7708..50ceab7 100644 >>> --- a/src/PVE/Storage/DirPlugin.pm >>> +++ b/src/PVE/Storage/DirPlugin.pm >>> @@ -260,14 +260,25 @@ sub get_import_metadata { >>> # NOTE: all types must be added to the return schema of the import-metadata API endpoint >>> my $warnings = []; >>> >>> + my $isOva = 0; >>> + if ($path =~ m!\.ova!) { >> >> Would be nicer if parse_volname() would return the $file_format and we >> chould check for that. Also missing the $ in the regex, so you'd >> mismatch a weird filename like ABC.ovaXYZ.ovf or? > > yeah the $ is missing, and yes, we could return ova/ovf as format there > as we want to change the 'needs extracting' check anyway > > >> >>> + $isOva = 1; >>> + push @$warnings, { type => 'ova-needs-extracting' }; >>> + } >>> my $res = PVE::Storage::OVF::parse_ovf($path, $isOva); >>> my $disks = {}; >>> for my $disk ($res->{disks}->@*) { >>> my $id = $disk->{disk_address}; >>> my $size = $disk->{virtual_size}; >>> my $path = $disk->{relative_path}; >>> + my $volid; >>> + if ($isOva) { >>> + $volid = "$storeid:$volname/$path"; >>> + } else { >>> + $volid = "$storeid:import/$path", >>> + } >>> $disks->{$id} = { >>> - volid => "$storeid:import/$path", >>> + volid => $volid, >>> defined($size) ? (size => $size) : (), >>> }; >>> } >>> diff --git a/src/PVE/Storage/OVF.pm b/src/PVE/Storage/OVF.pm >>> index 4a322b9..fb850a8 100644 >>> --- a/src/PVE/Storage/OVF.pm >>> +++ b/src/PVE/Storage/OVF.pm >>> @@ -85,11 +85,37 @@ sub id_to_pve { >>> } >>> } >>> >>> +# technically defined in DSP0004 (https://www.dmtf.org/dsp/DSP0004) as an ABNF >>> +# but realistically this always takes the form of 'bytes * base^exponent' >> >> The comment wrongly says 'bytes' instead of 'byte' (your test examples >> confirm this). >> >>> +sub try_parse_capacity_unit { >>> + my ($unit_text) = @_; >>> + >>> + if ($unit_text =~ m/^\s*byte\s*\*\s*([0-9]+)\s*\^\s*([0-9]+)\s*$/) { >> >> Fun regex :P >> >>> + my $base = $1; >>> + my $exp = $2; >>> + return $base ** $exp; >>> + } >>> + >>> + return undef; >>> +} >>> + >> >> (...) >> >>> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm >>> index deaf8b2..ea069ab 100644 >>> --- a/src/PVE/Storage/Plugin.pm >>> +++ b/src/PVE/Storage/Plugin.pm >>> @@ -654,6 +654,11 @@ sub parse_volname { >>> return ('backup', $fn); >>> } elsif ($volname =~ m!^snippets/([^/]+)$!) { >>> return ('snippets', $1); >>> + } elsif ($volname =~ m!^import/([^/]+\.ova)\/([^/]+)$!) { >>> + my $archive = $1; >>> + my $file = $2; >>> + my (undef, $format, undef) = parse_name_dir($file); >>> + return ('import', $archive, 0, undef, undef, undef, $format); >> >> So we return the same $name for different things here? Not super happy >> with that either. If we were to get creative we could say the archive is >> the "base" of the image, but surely also comes with caveats. > > i'll change this in a v2 should not be necessary > >> >>> } elsif ($volname =~ m!^import/([^/]+$PVE::Storage::IMPORT_EXT_RE_1)$!) { >>> return ('import', $1); >>> } elsif ($volname =~ m!^import/([^/]+\.(raw|vmdk|qcow2))$!) { > > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel