From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 3CC091FF38E for ; Tue, 11 Jun 2024 15:25:28 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9655C378DB; Tue, 11 Jun 2024 15:26:02 +0200 (CEST) Message-ID: <283330a0-658c-4b09-9242-79dc941c0fa3@proxmox.com> Date: Tue, 11 Jun 2024 15:25:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Proxmox VE development discussion , Hannes Duerr References: <20240403150712.262773-1-h.duerr@proxmox.com> Content-Language: en-US From: Fiona Ebner In-Reply-To: <20240403150712.262773-1-h.duerr@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -0.059 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [plugin.pm, proxmox.com] Subject: Re: [pve-devel] [PATCH pve-storage 1/1] storage/plugin: implement ref-counting for disknames in get_next_vm_diskname X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Do you mean "reservation" rather than "ref-counting"? Am 03.04.24 um 17:07 schrieb Hannes Duerr: > As Fabian has already mentioned here[0], there can be a race between two > parallel imports. More specifically, if both imports have --allow-rename > set and the desired name already exists, then it can happen that both > imports get the same name. The reason for this is that we currently only > check which names have already been assigned in the vm config and then > use the next free one, and do not check whether a running process has > already been assigned the same name. However, while writing and testing > the patch, I found that this is often not a problem, as > - in most cases the source VM config is locked and therefore only one > process can generate a new name (migrate, clone, move disk, update > config) > - a name must be specified and therefore no new name is generated (pvesm > alloc) > - the timeframe for the race is very short. > > At the same time, it is possible that I have not considered all edge > cases and that there are other cases where the race can occur, > especially with regard to remote_migrations. You can provoke the problem > with two parallel imports on a host where local-lvm:vm-100-disk-0 > already exists: > > pvesm import local-lvm:vm-100-disk-0 raw+size --allow-rename 1 > >From a quick glance, it seems remote migration does one disk-import at a time, i.e. only starts the next one once the previous one is finished. In fact, even a malicious client could only do one import at a time, because there only is one socket (per VM): > $params->{unix} = "/run/qemu-server/$state->{vmid}.storage"; > > return PVE::StorageTunnel::handle_disk_import($state, $params); > Now that I've looked into the problem a bit, I'm not sure this patch is > even necessary as it adds more complexity. So I wanted to ask for your > opinion, wether you think it makes sense to add this change or not. > Please either use RFC for sending the patch then and/or put this paragraph below the --- so it won't appear in the commit message. IMHO, it would be nice to fix, but yes, it is a bunch of new code. I think you could make it slightly simpler though, by doing the following in the locked section: 1. iterate over reservations -> if stale, remove -> if not stale, add to $disk_list 2. get the next free name using the larger $disk_list 3. print the updated list of reservations including the new name > The patch introduces a tmp file which stores the newly assigned disk > names and the pid of the process which requested the disk name. If a > second process is assigned the same name, it will see from the file that > the name has already been assigned to another process, and will take the > next available name. Reading and writing to the tmp file requires a lock > to prevent races. > > [0] https://lists.proxmox.com/pipermail/pve-devel/2024-January/061526.html > > Signed-off-by: Hannes Duerr > --- > src/PVE/Storage/Plugin.pm | 99 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 86 insertions(+), 13 deletions(-) > > diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm > index 7456c8e..f76550a 100644 > --- a/src/PVE/Storage/Plugin.pm > +++ b/src/PVE/Storage/Plugin.pm > @@ -10,8 +10,10 @@ use File::chdir; > use File::Path; > use File::Basename; > use File::stat qw(); > +use File::Copy; > So this is used for the single move() below? Any reason not to use file_set_contents() like in other places? You are using locking after all and then you would also not need the tmp file. > use PVE::Tools qw(run_command); > +use PVE::ProcFSTools; > use PVE::JSONSchema qw(get_standard_option register_standard_option); > use PVE::Cluster qw(cfs_register_file); > > @@ -779,23 +781,94 @@ my $get_vm_disk_number = sub { > sub get_next_vm_diskname { > my ($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix) = @_; > > - $fmt //= ''; > - my $prefix = ($fmt eq 'subvol') ? 'subvol' : 'vm'; > - my $suffix = $add_fmt_suffix ? ".$fmt" : ''; > + my $code = sub { > + my $reserved_names_file = "/var/tmp/pve-reserved-volnames"; > + my $tmp_file = "/var/tmp/pve-reserved-volnames.tmp"; > > - my $disk_ids = {}; > - foreach my $disk (@$disk_list) { > - my $disknum = $get_vm_disk_number->($disk, $scfg, $vmid, $suffix); > - $disk_ids->{$disknum} = 1 if defined($disknum); > - } > + $fmt //= ''; > + my $prefix = ($fmt eq 'subvol') ? 'subvol' : 'vm'; > + my $suffix = $add_fmt_suffix ? ".$fmt" : ''; > + my $disk_id; > + my $disk_ids = {}; > > - for (my $i = 0; $i < $MAX_VOLUMES_PER_GUEST; $i++) { > - if (!$disk_ids->{$i}) { > - return "$prefix-$vmid-disk-$i$suffix"; > + foreach my $disk (@$disk_list) { > + my $disknum = $get_vm_disk_number->($disk, $scfg, $vmid, $suffix); > + $disk_ids->{$disknum} = 1 if defined($disknum); > } > - } > > - die "unable to allocate an image name for VM $vmid in storage '$storeid'\n" > + for (my $i = 0; $i < $MAX_VOLUMES_PER_GUEST; $i++) { > + if (!$disk_ids->{$i}) { > + $disk_id = $i; > + last; > + } > + } > + > + if (! -e $reserved_names_file) { > + my $create_h = IO::File->new($reserved_names_file, "w") || > + die "can't open or create'$reserved_names_file' - $!\n"; > + print $create_h "$storeid $vmid $disk_id $$"; Not sure about using PIDs here. PIDs are re-used and can't this also be the PID of some long-running daemon? Maybe we can use some expire time like the next_unused_port() helper in PVE::Tools does? Can also be a few minutes in this case, as we don't expect reservations to saturate. [...] > + my $new_disk_name = PVE::Tools::lock_file('/var/lock/pve-disk-names.lck', 10, $code); The lock file is local to the node, so this won't work for shared storages. For those, you would need track the reservations in a file on the cluster file system. > + die $@ if $@; > + > + if (!$new_disk_name) { > + die "unable to allocate an image name for VM $vmid in storage '$storeid'\n"; > + } > + return $new_disk_name; > } > > sub find_free_diskname { _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel