From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 8A8621FF17C for <inbox@lore.proxmox.com>; Wed, 28 May 2025 16:49:43 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F23D737FA0; Wed, 28 May 2025 16:49:56 +0200 (CEST) Date: Wed, 28 May 2025 10:49:43 -0400 To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>, Proxmox VE development discussion <pve-devel@lists.proxmox.com> References: <mailman.34.1748269920.395.pve-devel@lists.proxmox.com> <1594409888.20674.1748329375230@webmail.proxmox.com> <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com> <29844581.21458.1748416013495@webmail.proxmox.com> In-Reply-To: <29844581.21458.1748416013495@webmail.proxmox.com> MIME-Version: 1.0 Message-ID: <mailman.85.1748443796.395.pve-devel@lists.proxmox.com> List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Post: <mailto:pve-devel@lists.proxmox.com> From: Andrei Perapiolkin via pve-devel <pve-devel@lists.proxmox.com> Precedence: list Cc: Andrei Perapiolkin <andrei.perepiolkin@open-e.com> X-Mailman-Version: 2.1.29 X-BeenThere: pve-devel@lists.proxmox.com List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> Subject: Re: [pve-devel] Volume live migration concurrency Content-Type: multipart/mixed; boundary="===============0084200948287842236==" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> --===============0084200948287842236== Content-Type: message/rfc822 Content-Disposition: inline Return-Path: <andrei.perepiolkin@open-e.com> X-Original-To: pve-devel@lists.proxmox.com Delivered-To: pve-devel@lists.proxmox.com Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 0BEC5C9E00 for <pve-devel@lists.proxmox.com>; Wed, 28 May 2025 16:49:56 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E943037EEA for <pve-devel@lists.proxmox.com>; Wed, 28 May 2025 16:49:55 +0200 (CEST) Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.130]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for <pve-devel@lists.proxmox.com>; Wed, 28 May 2025 16:49:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=open-e.com; s=s1-ionos; t=1748443785; x=1749048585; i=andrei.perepiolkin@open-e.com; bh=1lTBcewdBRIuN3nDkhRwofAukC21ckGSJ24MY9QF7xA=; h=X-UI-Sender-Class:Message-ID:Date:MIME-Version:Subject:To: References:From:In-Reply-To:Content-Type: Content-Transfer-Encoding:cc:content-transfer-encoding: content-type:date:from:message-id:mime-version:reply-to:subject: to; b=vFpAKsP8yF1ieFB63GmaPkM4GBbp2tquVImELn1acvXg+n9B8f/Rr6sknQAIjU9w 5aZbnbIDQW7Af1HvVXWNHahR5bcISsYQixDcc3qzNO0gm/90dIljqovBvVo2135GN rgJo2rqULnXynMvAwi6SKUF5u0sYnsXZ9LP4XuQ50phJFyoBv0y+H71REZ+xJehum pufLMSvOGfr1v/gpXlCfGT/6I2ZlNyMUFtLT1Vp5/JCfSQl+CfLyDz/jAGl+vU7k6 odTeukSWivEzKIQBplGQ6k7vf5+WuVTipjff9iXk6hpd1id+nXYYAaB6UqfUoEOZa OQNlJaWzhK9564EKdw== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Received: from [10.137.0.75] ([149.102.246.43]) by mrelayeu.kundenserver.de (mreue011 [212.227.15.167]) with ESMTPSA (Nemesis) id 1MQusJ-1uVeec0nhf-00RBvp; Wed, 28 May 2025 16:49:45 +0200 Message-ID: <9d2b0cf4-7037-491c-b4a4-81538e63376d@open-e.com> Date: Wed, 28 May 2025 10:49:43 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [pve-devel] Volume live migration concurrency To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>, Proxmox VE development discussion <pve-devel@lists.proxmox.com> References: <mailman.34.1748269920.395.pve-devel@lists.proxmox.com> <1594409888.20674.1748329375230@webmail.proxmox.com> <4773cf91-8a0f-453d-b9a1-11dcad1a193f@open-e.com> <29844581.21458.1748416013495@webmail.proxmox.com> Content-Language: en-US From: Andrei Perapiolkin <andrei.perepiolkin@open-e.com> In-Reply-To: <29844581.21458.1748416013495@webmail.proxmox.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:+zCVmIB5OGJDnQ1X/1aElkEBhLfW0TNfhtW9FklAVnicZriS1of TTOjfkoOUleDUVqMT7OFrHuuOnf4SAJzGcXvfPbhFimbtNUMsasc10fgcBiDv7F76quRbhQ noc3YE3S40fHWCBjlhKpk1Y+soh6fWHJxJxWD3XkOwZaX/B+fWgIDFRse8FCXFRuEgRojC7 RBtA1hPGWcggSml1n4Gpw== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:mwVirt4ujdw=;kiMEGvjiNGUJHXCNZtEGfq8v91y Za+G0fHMJAspxNRUC61rKVWVve4hEPollOq7ZE+DvWnjAyMr7lSZ0eSJ9pARmtrv9xU3Kd7QV xZSS8E8+G7MWfX1BKdu8Y49qSg5184732qvW6rLUVt2owh4n3xza+G1wvV+Y1FwGUZA5VtUqA G2rqL4OdYgF5eJKp1yJf1zlzCbIXNmipAF4kpTGKWo1Py/PWKYG4IyBBJcObTPrMrxZULVGTf mXLWoYOvmsoKXrfKgAwKritE6NBSSz1w9iiwXdJsXnZKo1CAR4mV+zuAxWE88i2gMGR57rm+e qrpwxSQq6jhCwsikyUx01pbISnJpCZ7BLIBBaJsRjrhEKB355R56AXmhQ5eKSxvDaMczSpoBg LX6g4Qb9OYFX1Evo8ZzRQWBMJ/TGm3sbn67kdTbj4Rh9kdHeT8MXnnAcfsVHi12sshntOqtEd XFej/p+zSwpT77POffkqT6+kQn8qTzLhg4Of7avRqI/m8rAoqhc7+HXiqCF5pHgYksTfxNnxv gDo1skpn8lATB5rgMNQRE4oSqn72kYwH+OalgUFZZ8vq7p0UucCDgFGzcMVG3pMDq0Z17pHAG PEzsn2HEOaIZkWE0uBfeEVFIK9b187T3pAPU+6vxwdxFl79N7Xyu11QSLUnw8o8sK2gVSYeKI B+54FYJjpw+TaXaEiPqHop03b+/dD5DxwBCSbtx+vQpMOAunOWMft6Psny1W6BNCIgPLDxEue xCdLgyNGPO+HNiwUxShd8Y6cK4/QqnQ7GCJiUMqVjmwnFOtX2gBPJkDDWv/o9fTg3WJBmre50 UmW2K4pzSdJ8dSE68M1Nt/D8irU+eW4m5ORHo+uw8e+WimR40oisKAxxLm9DclVu6SmHeejon iPmKKV1be9dR/WNeiXUAuYOAa69lM1/MEqS9WZa6W8xdDIGtPulY9wtvXRu8qrssNNW1eAmuY YUTGgIiNnsDqNBJgEPeJOZI//wtDSF99ArGGkcn1fHFdeoUfX7Pb8Cb80qnnc+ZOALJITN0xS RrGHUjfGbtY+n61Qo+baae+6lH/sbghPQOhM409kRZJFQ445AW2QONc18cgNmnH+RHjMO//QJ vmml2GYXwlobZCtitk75rc/vXBuHtQu5CwE370w4hYoDxYlUCy1nXJczcSDIEZYcB+Ltf1e1g nwrwatghVNX6L/PQvnyn0WrR0KoIZorbAK/x0VHa9pMhc+b/djNOoPjYuUB0U8MUpJ+B9mPAt vaZqDrrWFRzCDayWQUQ3wrabJOmyK30AuTFHdjQIIMhhmtqsrqsaOCfMb2FvCwO/HGSTTHiI7 fw2eZ7/UBv1CNRwDtlv08D0goWLvLwWcomQGqNwCj2UNFrvyi3mwB1gE7Trzd4yT2QWOT7rZx s79XYbai3e5bYGMVxFcN6+XxG0I/PUaWp0gnBvPpJ1Pba44FTa8EsOxyt9jVZOHUQ7web3AfY xHIn5FiVJFog8tsOrsn6CAKJxxH0= X-SPAM-LEVEL: Spam detection results: 0 AWL -0.202 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust RCVD_IN_MSPIKE_H5 0.001 Excellent reputation (+5) RCVD_IN_MSPIKE_WL 0.001 Mailspike good senders RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [open-e.com] Hi Fabian, Thank you for your time dedicated to this issue. >> My current understanding is that all assets related to snapshots should= =20 >> to be removed when volume is deactivation, is it correct? >> Or all volumes and snapshots expected to be present across the entire= =20 >> cluster until they are explicitly deleted? > I am not quite sure what you mean by "present" - do you mean "exist in a= n activated state"? Exists in an active state - activated. >> How should the cleanup tasks be triggered across the remaining nodes? > it should not be needed=20 Consider following scenarios of live migration of a VM from 'node1' to=20 'node2': 1. Error occurs on 'node2' resulting in partial activation 2.=C2=A0Error occurs on 'node1' resulting in partial deactivation 3. Error occurs on both 'node1' and 'node2' resulting in dangling=20 artifacts remain on both 'node1' and 'node2' That might lead to partial activation(some artefacts might be created)=20 and partial deactivation(some artifacts might remain uncleared). Now, suppose the user unlocks the VM (if it was previously locked due to= =20 the failure) and proceeds with another migration attempt, this time to=20 'node3', hoping for success. What would happen to the artifacts on 'node1' and 'node2' in such a case? Regarding 'path' function In my case it is difficult to deterministically predict actual path of=20 the device. Determining this path essentially requires activating the volume. This approach is questionable, as it implies calling activate_volume=20 without Proxmox being aware that the activation has occurred. What would happen if a failure occurs within Proxmox before it reaches=20 the stage of officially activating the volume? Additionaly I believe that providing 'physical path' of the resource=20 that is not yet present(i.e. activated and usable) is a questionable=20 practice. This creates a risk, as there is always a temptation to use the path=20 directly, under the assumption that the resource is ready. This approach assumes that all developers are fully aware that a given=20 $path might merely be a placeholder, and that additional activation is=20 required before use. The issue becomes even more complex in larger code base that integrate=20 third-party software=E2=80=94such as QEMU. I might be mistaken, but during my experiments with the 'path' function,= =20 I encountered an error where the virtualization system failed to open a=20 volume that had not been fully activated. Perhaps this has been addressed in newer versions, but previously, there= =20 appeared to be a race condition between volume activation and QEMU=20 attempting to operate on the expected block device path. Andrei On 5/28/25 03:06, Fabian Gr=C3=BCnbichler wrote: >> Andrei Perapiolkin <andrei.perepiolkin@open-e.com> hat am 27.05.2025 18= :08 CEST geschrieben: >> >> =20 >>> 3. In the context of live migration: Will Proxmox skip calling >>> /deactivate_volume/ for snapshots that have already been activated? >>> Should the storage plugin explicitly deactivate all snapshots of a >>> volume during migration? >>> a live migration is not concerned with snapshots of shared volumes, an= d local >>> volumes are removed on the source node after the migration has finishe= d.. >>> >>> but maybe you could expand this part? >> My original idea was that since both 'activate_volume' and >> 'deactivate_volume' methods have a 'snapname' argument they would both >> be used to activate and deactivate snapshots respectivly. >> And for each snapshot activation, there would be a corresponding >> deactivation. > deactivating volumes (and snapshots) is a lot trickier than activating > them, because you might have multiple readers in parallel that we don't > know about. > > so if you have the following pattern > > activate > do something > deactivate > > and two instances of that are interleaved: > > A: activate > B: activate > A: do something > A: deactivate > B: do something -> FAILURE, volume not active > > you have a problem. > > that's why we deactivate in special circumstances: > - as part of error handling for freshly activated volumes > - as part of migration when finally stopping the source VM or before > freeing local source volumes > - .. > > where we can be reasonably sure that no other user exists, or it is > required for safety purposes. > > otherwise, we'd need to do refcounting on volume activations and have > some way to hook that for external users, to avoid premature deactivatio= n. > >> However, from observing the behavior during migration, I found that >> 'deactivate_volume' is not called for snapshots that were previously >> activated with 'activate_volume'. > where they activated for the migration? or for cloning from a snapshot? > or ..? > > maybe there is call path that should deactivate that snapshot after usin= g > it.. > >> Therefore, I assumed that 'deactivate_volume' is responsible for >> deactivating all snapshots related to the volume that was previously >> activated. >> The purpose if this question was to confirm this. >> >> From your response I conclude the following: >> 1. Migration does not manages(i.e. it does not activate or deactivate >> them=C2=A0 volume snapshots. > that really depends. a storage migration might activate a snapshot if > that is required for transferring the volume. this mostly applies to > offline migration or unused volumes though, and only for some storages. > >> 2. All volumes are expected to be present across all nodes in cluster, >> for 'path' function to work. > if at all possible, path should just do a "logical" conversion of volume= ID > to a stable/deterministic path, or the information required for Qemu to > access the volume if no path exists. ideally, this means it works withou= t > activating the volume, but it might require querying the storage. > >> 3. For migration to work volume should be simultaneously present on bot= h >> nodes. > for a live migration and shared storage, yes. for an offline migration w= ith > shared storage, the VM is never started on the target node, so no volume > activation is required until that happens later. for local storages, vol= umes > only exist on one node anyway (they are copied during the migration). > >> However, I couldn't find explicit instructions or guides on when and by >> whom volume snapshot deactivation should be triggered. > yes, this is a bit under-specified unfortunately. we are currently worki= ng > on improving the documentation (and the storage plugin API). > >> Is it possible for a volume snapshot to remain active active after >> volume itself was deactivated? > I'd have to check all the code paths to give an answer to that. > snapshots are rarely activated in general - IIRC mostly for > - cloning from a snapshot > - replication (limited to ZFS at the moment) > - storage migration > > so just did that: > - cloning from a snapshot only deactivates if the clone is to a differen= t > node, for both VM and CT -> see below > - CT backup in snapshot mode deletes the snapshot which implies deactiva= tion > - storage_migrate (move_disk or offline migration) if a snapshot is pass= ed, > IIRC this only affects ZFS, which doesn't do activation anyway > >> During testing proxmox 8.2 Ive encountered situations when cloning a >> volume from a snapshot did not resulted in snapshot deactivation. >> This leads to the creation of 'dangling' snapshots if=C2=A0 the volume = is >> later migrated. > ah, that probably answers my question above. > > I think this might be one of those cases where deactivation is hard - yo= u > can have multiple clones from the same source VM running in parallel, an= d > only the last one would be allowed to deactivate the snapshot/volume.. > >> My current understanding is that all assets related to snapshots should >> to be removed when volume is deactivation, is it correct? >> Or all volumes and snapshots expected to be present across the entire >> cluster until they are explicitly deleted? > I am not quite sure what you mean by "present" - do you mean "exist in a= n > activated state"? > >> Second option requires additional recommendation on artifact management= . >> May be it should be sent it as an separate email, but draft it here. >> >> If all volumes and snapshots are consistently present across entire >> cluster and their creation/operation results in creation of additional >> artifacts(such as iSCSI targets, multipath sessions, etc..), then this >> artifacts should be removed on deletion of associated volume or snapsho= t. >> Currently, it is unclear how all nodes in the cluster are notified of >> such deletion as only one node in the cluster receives 'free_image' or >> 'volume_snapshot_delete'=C2=A0 request. >> What is a proper way to instruct plugin on other nodes in the cluster >> that given volume/snapshot is requested for deletion and all artifacts >> related to it have to be removed? > I now get where you are coming from I think! a volume should only be act= ive > on a single node, except during a live migration, where the source node > will always get a deactivation call at the end. > > deactivating a volume should also tear down related, volume-specific > resources, if applicable. > >> How should the cleanup tasks be triggered across the remaining nodes? > it should not be needed, but I think you've found an edge case where we > need to improve. > > I think our RBD plugin is also affected by this, all the other plugins > either: > - don't support snapshots (or cloning from them) > - are local only > - don't need any special activation/deactivation > > I think the safe approach is likely to deactivate all snapshots when > deactivating the volume itself, for now. > --===============0084200948287842236== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel --===============0084200948287842236==--